` container (ChatInput.tsx line 312) currently holds: `[voice button] [submit button]`.
+
+After this phase, when `showLocalTranscribe && localTranscribeHook.isSupported`:
+
+```
+[PrivacyBadge] [RecordingTimer?] [LocalTranscribeButton] [SubmitButton]
+```
+
+- `PrivacyBadge` always renders (when local transcribe is active and supported)
+- `RecordingTimer` conditionally renders only during `recording` state
+- `LocalTranscribeButton` and `SubmitButton` unchanged from Phase 3/4
+
+The `gap-1` (4px) between items provides compact spacing consistent with the existing button row.
+
+Source: ChatInput.tsx lines 312-346, CONTEXT.md D-01
+
+### Worker -- Silence Detection (no visual component)
+
+No new UI component. The Worker gains two internal checks that produce a `{ status: 'silence' }` message. The hook maps this status to a `toast.info()` notification.
+
+**Worker message format extension:**
+
+Current statuses: `ready`, `result`, `error`
+New status: `silence`
+
+```typescript
+{ status: 'silence' }
+```
+
+**RMS energy threshold:** Tunable constant in Worker. Recommended starting value: `0.01` (RMS of Float32Array samples). Below this threshold, audio is considered silence and transcription is skipped entirely.
+
+**Hallucination filter:** Post-transcription check for known Whisper silence outputs. Match against patterns:
+- Text length <= 5 characters (after trim)
+- Known hallucination strings: "Thank you.", "Thanks for watching.", "(music)", "...", "(silence)", "You", "Untertitel", single punctuation marks
+- Repetitive phrases: same word/phrase repeated 3+ times
+
+If any pattern matches, Worker returns `{ status: 'silence' }` instead of `{ status: 'result', text: '...' }`.
+
+Source: CONTEXT.md D-07, D-08, D-09, REQUIREMENTS.md ERR-05
+
+---
+
+## Interaction Contracts
+
+### Recording with Timer
+
+1. User clicks mic button -> recording begins (existing flow from Phase 3)
+2. Hook exposes `elapsedSeconds` as reactive state (derived from existing `startTimeRef` + 100ms interval, converted to whole seconds)
+3. RecordingTimer appears to the left of the button, showing "0:00 / 2:00"
+4. Timer updates every second (derived from the existing 100ms interval, but UI re-renders only when the second changes)
+5. At 1:45 elapsed (105s), timer text turns red (`text-red-600`)
+6. At 2:00 elapsed (120s), recording auto-stops (existing behavior), timer disappears, transcription begins
+7. If user manually stops before 2:00, timer disappears, transcription begins
+
+### Privacy Badge Presence
+
+1. Local transcription extension is active on assistant and `isSupported === true`
+2. PrivacyBadge renders immediately, before any user interaction
+3. Badge remains visible through all states: idle, downloading, loading, recording, transcribing, error
+4. Hovering/focusing the badge shows tooltip with full privacy explanation
+5. Badge provides no interactive behavior (not clickable, no state changes)
+
+### Silence Detection Flow
+
+1. Recording completes, audio sent to Worker for transcription (existing flow)
+2. Worker receives audio Float32Array
+3. **Layer 1 -- RMS check:** Worker computes RMS energy of audio samples. If RMS < threshold (0.01), Worker returns `{ status: 'silence' }` immediately. Transcription is skipped (saves compute time).
+4. **Layer 2 -- Hallucination filter:** If RMS >= threshold, Worker runs transcription normally. After transcription, Worker checks output against hallucination patterns. If match found, Worker returns `{ status: 'silence' }` instead of `{ status: 'result' }`.
+5. Hook receives `{ status: 'silence' }` -- same handling as `{ status: 'result', text: '' }`: calls `toast.info()` with silence-specific message, returns to idle state, does NOT insert text into chat input.
+
+Source: CONTEXT.md D-07 through D-10, REQUIREMENTS.md ERR-05
+
+---
+
+## Copywriting Contract
+
+| Element | English (en) | German (de) |
+|---------|-------------|-------------|
+| Privacy badge text | Local | Lokal |
+| Privacy badge tooltip | Audio is processed locally and never leaves your browser | Audio wird lokal verarbeitet und verlässt niemals Ihren Browser |
+| Silence detected toast | No speech detected. Try speaking louder or closer to the microphone. | Keine Sprache erkannt. Versuchen Sie, lauter oder näher am Mikrofon zu sprechen. |
+| Timer aria-label | Recording timer | Aufnahme-Timer |
+
+**i18n Key Mapping (all under `texts.chat.localTranscribe.*`):**
+
+| i18n Key | Status | Element |
+|----------|--------|---------|
+| `privacyBadge` | **New** | Badge visible text |
+| `privacyTooltip` | **New** | Badge hover tooltip |
+| `silenceDetected` | **New** | Toast message for silence/hallucination |
+| `timerLabel` | **New** | Aria-label for timer container |
+
+**New keys total:** 4 keys in English, 4 keys in German.
+
+**Relationship to existing keys:**
+- `emptyTranscription` (existing) -- used when Worker returns empty text (no audio data). Distinct from `silenceDetected` which is used when Worker explicitly identifies silence via RMS check or hallucination filter.
+- Both `emptyTranscription` and `silenceDetected` use `toast.info()` and return to idle.
+
+**Copywriting distinction:**
+- `emptyTranscription`: "No speech could be recognized" -- implies the system tried but failed to find speech
+- `silenceDetected`: "No speech detected" -- implies the system detected the absence of speech signal, a more definitive assessment
+
+Source: CONTEXT.md D-06, D-10, en.ts lines 191-212, de.ts lines 194-216
+
+---
+
+## Accessibility Contract
+
+| Element | ARIA attribute | Value |
+|---------|---------------|-------|
+| RecordingTimer container | `aria-label` | `texts.chat.localTranscribe.timerLabel` |
+| RecordingTimer container | `aria-live` | `off` (prevents disruptive per-second announcements) |
+| PrivacyBadge container | `data-tooltip-id` | `default` (existing tooltip system) |
+| PrivacyBadge container | `data-tooltip-content` | `texts.chat.localTranscribe.privacyTooltip` |
+| PrivacyBadge container | `tabIndex` | `0` (focusable for keyboard tooltip access) |
+| All existing elements | unchanged | See Phase 3/4 UI-SPEC |
+
+The silence detection toast uses `toast.info()` which inherits react-toastify's built-in `role="status"` and `aria-live="polite"` attributes, consistent with the existing `emptyTranscription` toast pattern.
+
+The RecordingTimer intentionally uses `aria-live="off"` because:
+1. The timer updates every second, which would flood screen reader output
+2. The auto-stop at 2:00 already produces a `toast.info()` announcement (maxDurationReached)
+3. Users who need to know the time limit can read the timer text at their discretion
+
+Source: Phase 3/4 UI-SPEC accessibility contract, react-toastify defaults
+
+---
+
+## Worker Communication Contract Extension
+
+The Worker message format is extended with a new `silence` status code.
+
+**Current statuses (from Phase 3/4):**
+
+| Status | Fields | Meaning |
+|--------|--------|---------|
+| `ready` | `{ status: 'ready' }` | Model loaded successfully |
+| `result` | `{ status: 'result', text: string }` | Transcription complete |
+| `error` | `{ status: 'error', error: string, code: string }` | Failure occurred |
+
+**New status (Phase 5):**
+
+| Status | Fields | Meaning |
+|--------|--------|---------|
+| `silence` | `{ status: 'silence' }` | Audio contained no detectable speech (RMS below threshold or hallucination pattern matched) |
+
+**Hook mapping for `silence` status:**
+```
+status === 'silence' -> toast.info(texts.chat.localTranscribe.silenceDetected)
+ -> setState('idle')
+ -> do NOT call onTranscriptReceived
+```
+
+Source: CONTEXT.md D-08, D-09, D-10, whisper.worker.ts current message format
+
+---
+
+## Registry Safety
+
+| Registry | Blocks Used | Safety Gate |
+|----------|-------------|-------------|
+| Not applicable | -- | -- |
+
+This phase uses no shadcn components and no third-party registries. All UI is built with Mantine components (ActionIcon, Group, Menu) and Tailwind utility classes, consistent with the existing codebase. The only new Tabler icon is `IconShieldCheck`.
+
+---
+
+## Checker Sign-Off
+
+- [ ] Dimension 1 Copywriting: PASS
+- [ ] Dimension 2 Visuals: PASS
+- [ ] Dimension 3 Color: PASS
+- [ ] Dimension 4 Typography: PASS
+- [ ] Dimension 5 Spacing: PASS
+- [ ] Dimension 6 Registry Safety: PASS
+
+**Approval:** pending
diff --git a/.planning/milestones/v1.0-phases/05-polish-refinement/05-VALIDATION.md b/.planning/milestones/v1.0-phases/05-polish-refinement/05-VALIDATION.md
new file mode 100644
index 000000000..ce039f340
--- /dev/null
+++ b/.planning/milestones/v1.0-phases/05-polish-refinement/05-VALIDATION.md
@@ -0,0 +1,94 @@
+---
+phase: 5
+slug: polish-refinement
+status: compliant
+nyquist_compliant: true
+wave_0_complete: true
+created: 2026-05-08
+validated: 2026-05-08
+---
+
+# Phase 5 — Validation Strategy
+
+> Per-phase validation contract for feedback sampling during execution.
+
+---
+
+## Test Infrastructure
+
+| Property | Value |
+|----------|-------|
+| **Framework** | vitest 4.1.4 |
+| **Config file** | `frontend/vite.config.ts` (test section, lines 18-38) |
+| **Quick run command** | `cd frontend && npx vitest run` |
+| **Full suite command** | `cd frontend && npx vitest run` |
+| **Estimated runtime** | ~30 seconds |
+
+---
+
+## Sampling Rate
+
+- **After every task commit:** Run `cd frontend && npx vitest run`
+- **After every plan wave:** Run `cd frontend && npx vitest run`
+- **Before `/gsd-verify-work`:** Full suite must be green
+- **Max feedback latency:** 30 seconds
+
+---
+
+## Per-Task Verification Map
+
+| Task ID | Plan | Wave | Requirement | Threat Ref | Secure Behavior | Test Type | Automated Command | File Exists | Status |
+|---------|------|------|-------------|------------|-----------------|-----------|-------------------|-------------|--------|
+| 05-01-01 | 01 | 1 | UI-05 | — | N/A | unit | `cd frontend && npx vitest run src/hooks/useLocalTranscribe.ui-unit.spec.ts -t "elapsedSeconds"` | ✅ | ✅ green |
+| 05-01-02 | 01 | 1 | UI-05 | — | N/A | unit | `cd frontend && npx vitest run src/pages/chat/conversation/RecordingTimer.ui-unit.spec.tsx` | ✅ | ✅ green |
+| 05-01-03 | 01 | 1 | UI-05 | — | N/A | unit | `cd frontend && npx vitest run src/pages/chat/conversation/RecordingTimer.ui-unit.spec.tsx -t "warning"` | ✅ | ✅ green |
+| 05-01-04 | 01 | 1 | UI-06 | — | N/A | unit | `cd frontend && npx vitest run src/pages/chat/conversation/PrivacyBadge.ui-unit.spec.tsx` | ✅ | ✅ green |
+| 05-01-05 | 01 | 1 | ERR-05 | — | Transcription text auto-escaped by React | unit | `cd frontend && npx vitest run src/workers/whisper.worker.ui-unit.spec.ts -t "silence"` | ✅ | ✅ green |
+| 05-01-06 | 01 | 1 | ERR-05 | — | Worker same-origin, type-safe messages | unit | `cd frontend && npx vitest run src/workers/whisper.worker.ui-unit.spec.ts -t "hallucination"` | ✅ | ✅ green |
+| 05-01-07 | 01 | 1 | ERR-05 | — | N/A | unit | `cd frontend && npx vitest run src/hooks/useLocalTranscribe.ui-unit.spec.ts -t "silence"` | ✅ | ✅ green |
+
+*Status: ⬜ pending · ✅ green · ❌ red · ⚠️ flaky*
+
+---
+
+## Wave 0 Requirements
+
+- [x] `frontend/src/pages/chat/conversation/RecordingTimer.ui-unit.spec.tsx` — 8 tests for UI-05 (format, colors, accessibility)
+- [x] `frontend/src/pages/chat/conversation/PrivacyBadge.ui-unit.spec.tsx` — 5 tests for UI-06 (rendering, tooltip, focus, color)
+- [x] `frontend/src/hooks/useLocalTranscribe.ui-unit.spec.ts` — extended with 5 tests for `elapsedSeconds` state and `silence` status handling
+- [x] `frontend/src/workers/whisper.worker.ui-unit.spec.ts` — extended with 7 tests for RMS check, hallucination filter, `silence` status
+
+---
+
+## Manual-Only Verifications
+
+| Behavior | Requirement | Why Manual | Test Instructions |
+|----------|-------------|------------|-------------------|
+| Timer visual red color transition at 1:45 | UI-05 | CSS color rendering not verifiable in unit tests | 1. Start recording, 2. Wait until 1:45 elapsed, 3. Verify timer text turns red |
+| Privacy badge tooltip appears on hover | UI-06 | Tooltip hover interaction requires browser | 1. Hover over privacy badge, 2. Verify tooltip text appears |
+| Silence produces "No speech detected" toast | ERR-05 | Requires actual microphone silence + Whisper | 1. Start recording with mic muted, 2. Wait for auto-stop, 3. Verify toast shows |
+
+---
+
+## Validation Audit 2026-05-08
+
+| Metric | Count |
+|--------|-------|
+| Gaps found | 0 |
+| Resolved | 0 |
+| Escalated | 0 |
+
+All 7 tasks already have automated test coverage (25 new tests across 4 files). Full suite: 176 tests, 0 failures.
+
+---
+
+## Validation Sign-Off
+
+- [x] All tasks have `
` verify or Wave 0 dependencies
+- [x] Sampling continuity: no 3 consecutive tasks without automated verify
+- [x] Wave 0 covers all MISSING references
+- [x] No watch-mode flags
+- [x] Feedback latency < 30s
+- [x] `nyquist_compliant: true` set in frontmatter
+
+**Approval:** approved 2026-05-08
diff --git a/.planning/milestones/v1.0-phases/05-polish-refinement/05-VERIFICATION.md b/.planning/milestones/v1.0-phases/05-polish-refinement/05-VERIFICATION.md
new file mode 100644
index 000000000..73f26964f
--- /dev/null
+++ b/.planning/milestones/v1.0-phases/05-polish-refinement/05-VERIFICATION.md
@@ -0,0 +1,139 @@
+---
+phase: 05-polish-refinement
+verified: 2026-05-08T19:07:00Z
+status: passed
+score: 3/3
+overrides_applied: 0
+human_verification:
+ - test: "Recording timer counts up correctly during recording"
+ expected: "Timer starts at 0:00 / 2:00, counts up smoothly, turns red at 1:45, and auto-stops at 2:00"
+ why_human: "Timer animation, color transition timing, and digit stability require visual confirmation"
+ - test: "Privacy badge appearance and tooltip interaction"
+ expected: "Green shield icon with 'Local' text visible next to mic button; tooltip appears on hover/focus"
+ why_human: "Visual styling, icon rendering, and tooltip behavior cannot be verified programmatically"
+ - test: "Silence detection produces correct feedback"
+ expected: "Recording silence and stopping shows 'No speech detected' toast; no text inserted into chat input"
+ why_human: "End-to-end audio pipeline behavior requires real microphone interaction"
+ - test: "Normal transcription still works (regression)"
+ expected: "Speaking normally and stopping inserts transcribed text into chat input"
+ why_human: "Full audio pipeline from mic to Worker to UI requires running app"
+---
+
+# Phase 5: Polish & Refinement Verification Report
+
+**Phase Goal:** The feature feels production-ready with recording feedback, privacy communication, and edge-case handling
+**Verified:** 2026-05-08T19:07:00Z
+**Status:** human_needed
+**Re-verification:** No -- initial verification
+
+## Goal Achievement
+
+### Observable Truths
+
+| # | Truth | Status | Evidence |
+|---|-------|--------|----------|
+| 1 | A recording timer shows elapsed time relative to the 2-minute maximum (e.g. "0:42 / 2:00") while recording | VERIFIED | `RecordingTimer.tsx` renders `{formatTime(elapsedSeconds)} / {formatTime(maxSeconds)}` (line 25). Format function (lines 12-15) produces M:SS via `Math.floor(seconds/60)` + `padStart(2, '0')`. Hook exposes `elapsedSeconds` (line 386) updated every 100ms via `Math.floor(elapsed / 1000)` (line 109). ChatInput renders timer conditionally on `isRecording` with `maxSeconds={120}` (lines 328-333). Timer turns red at 105s via `WARNING_THRESHOLD = maxSeconds - 15` (line 9). Tests confirm "0:42 / 2:00", "0:00 / 2:00", "2:00 / 2:00" formats and red/gray color transitions. |
+| 2 | A visual indicator communicates that audio is processed locally and never leaves the browser | VERIFIED | `PrivacyBadge.tsx` renders `IconShieldCheck` (size 14, green-700) + i18n `privacyBadge` text ("Local"/"Lokal") with `data-tooltip-content` set to `privacyTooltip` ("Audio is processed locally and never leaves your browser"). ChatInput renders `` when `showLocalTranscribe && localTranscribeHook.isSupported` (line 327) -- always visible, not just during recording. Badge has `tabIndex={0}` for keyboard accessibility. |
+| 3 | Recording silence (no speech signal) produces a "Keine Sprache erkannt" / "No speech detected" message instead of Whisper hallucination text | VERIFIED | Worker Layer 1 (lines 147-152): `computeRMS()` check with `SILENCE_RMS_THRESHOLD = 0.01`, returns `{ status: 'silence' }` for quiet audio. Worker Layer 2 (lines 162-166): `isHallucination()` filter after transcription catches 26 known en/de hallucination patterns, punctuation-only text, and repetitive words. Hook `case 'silence'` (lines 200-203): `toast.info(texts.chat.localTranscribe.silenceDetected)` + `setState('idle')`. en.ts: "No speech detected. Try speaking louder or closer to the microphone." de.ts: "Keine Sprache erkannt. Versuchen Sie, lauter oder naher am Mikrofon zu sprechen." |
+
+**Score:** 3/3 truths verified
+
+### Required Artifacts
+
+| Artifact | Expected | Status | Details |
+|----------|----------|--------|---------|
+| `frontend/src/pages/chat/conversation/RecordingTimer.tsx` | Timer display component exporting RecordingTimer | VERIFIED | 28 lines, exports `RecordingTimer`, M:SS format, red/gray color, tabular-nums, aria-live="off" |
+| `frontend/src/pages/chat/conversation/PrivacyBadge.tsx` | Privacy badge component exporting PrivacyBadge | VERIFIED | 18 lines, exports `PrivacyBadge`, IconShieldCheck, green-700, tooltip, tabIndex |
+| `frontend/src/workers/whisper.worker.ts` | RMS silence check and hallucination filter with computeRMS | VERIFIED | `computeRMS` (line 48), `isHallucination` (line 56), `SILENCE_RMS_THRESHOLD = 0.01` (line 15), `HALLUCINATION_PATTERNS` (26 entries, lines 17-46), two `postMessage({ status: 'silence' })` at lines 150 and 164 |
+| `frontend/src/hooks/useLocalTranscribe.ts` | elapsedSeconds state and silence status handler | VERIFIED | `elapsedSeconds` state (line 23), updated in interval (line 109), reset in cleanup (line 74) and beginRecording (line 102), `case 'silence'` handler (lines 200-203), returned in hook output (line 386) |
+| `frontend/src/texts/languages/en.ts` | 4 new i18n keys | VERIFIED | `silenceDetected` (line 212), `privacyBadge` (line 213), `privacyTooltip` (line 214), `timerLabel` (line 215) |
+| `frontend/src/texts/languages/de.ts` | 4 new i18n keys | VERIFIED | `silenceDetected` (line 216), `privacyBadge` (line 217), `privacyTooltip` (line 218), `timerLabel` (line 219) |
+| `frontend/src/texts/index.ts` | 4 translate() bridge calls | VERIFIED | Lines 242-245: translate() calls for silenceDetected, privacyBadge, privacyTooltip, timerLabel |
+| `frontend/src/pages/chat/conversation/RecordingTimer.ui-unit.spec.tsx` | RecordingTimer component tests | VERIFIED | 58 lines, 8 test cases covering format, colors, tabular-nums, aria-live |
+| `frontend/src/pages/chat/conversation/PrivacyBadge.ui-unit.spec.tsx` | PrivacyBadge component tests | VERIFIED | 56 lines, 5 test cases covering text, icon, tooltip, tabIndex, green color |
+| `frontend/src/workers/whisper.worker.ui-unit.spec.ts` | Extended Worker tests for silence detection | VERIFIED | 7 new tests for RMS below threshold, hallucination patterns (en/de), punctuation, repetitive text, legitimate text passthrough |
+| `frontend/src/hooks/useLocalTranscribe.ui-unit.spec.ts` | Extended hook tests for elapsedSeconds and silence | VERIFIED | 5 new tests: elapsedSeconds initial 0, elapsedSeconds updates during recording, silence toast.info, silence idle transition, silence callback suppression |
+
+### Key Link Verification
+
+| From | To | Via | Status | Details |
+|------|----|-----|--------|---------|
+| whisper.worker.ts | useLocalTranscribe.ts | Worker postMessage with status: 'silence' | WIRED | Worker posts `{ status: 'silence' }` at lines 150, 164. Hook handles `case 'silence'` at line 200. |
+| useLocalTranscribe.ts | ChatInput.tsx | elapsedSeconds in hook return value | WIRED | Hook returns `elapsedSeconds` (line 386). ChatInput accesses `localTranscribeHook.elapsedSeconds` (line 330). |
+| ChatInput.tsx | RecordingTimer.tsx | RecordingTimer component with elapsedSeconds prop | WIRED | ChatInput imports RecordingTimer (line 16), renders `` (lines 329-332). |
+| ChatInput.tsx | PrivacyBadge.tsx | PrivacyBadge rendered when showLocalTranscribe && isSupported | WIRED | ChatInput imports PrivacyBadge (line 15), renders `` at line 327 inside the showLocalTranscribe branch. |
+| RecordingTimer.ui-unit.spec.tsx | RecordingTimer.tsx | import and render | WIRED | Test imports from `./RecordingTimer` (line 4), renders component in all 8 tests. |
+| PrivacyBadge.ui-unit.spec.tsx | PrivacyBadge.tsx | import and render | WIRED | Test imports from `./PrivacyBadge` (line 3), renders component in all 5 tests. |
+
+### Data-Flow Trace (Level 4)
+
+| Artifact | Data Variable | Source | Produces Real Data | Status |
+|----------|---------------|--------|--------------------|--------|
+| RecordingTimer.tsx | elapsedSeconds (prop) | useLocalTranscribe.ts -> Math.floor(elapsed / 1000) from Date.now() - startTimeRef | Yes -- real elapsed time from system clock | FLOWING |
+| PrivacyBadge.tsx | texts.chat.localTranscribe.privacyBadge (i18n) | texts/index.ts -> translate() -> en.ts/de.ts | Yes -- i18n string "Local"/"Lokal" | FLOWING |
+| ChatInput.tsx (silence path) | toast.info via hook | whisper.worker.ts computeRMS/isHallucination -> postMessage -> hook case 'silence' -> toast.info | Yes -- RMS computed from real Float32Array audio | FLOWING |
+
+### Behavioral Spot-Checks
+
+| Behavior | Command | Result | Status |
+|----------|---------|--------|--------|
+| All tests pass | `cd frontend && npx vitest run` | 176 tests passed, 29 test files, 0 failures | PASS |
+| Commits exist | `git log --oneline ff2f62e 81c845a 3e1349c` | All three commits found | PASS |
+| TypeScript compiles | Verified via pre-commit tsc --noEmit in commit 3e1349c | PASS (per SUMMARY.md) | PASS |
+
+### Requirements Coverage
+
+| Requirement | Source Plan | Description | Status | Evidence |
+|-------------|------------|-------------|--------|----------|
+| UI-05 | 05-01, 05-02 | Recording-Timer zeigt vergangene Zeit an (z.B. "0:42 / 2:00") | SATISFIED | RecordingTimer.tsx renders M:SS / M:SS format, tested with 8 unit tests |
+| UI-06 | 05-01, 05-02 | Privacy-Badge/Indikator zeigt an, dass Audio lokal verarbeitet wird | SATISFIED | PrivacyBadge.tsx renders shield icon + "Local" text with tooltip, tested with 5 unit tests |
+| ERR-05 | 05-01, 05-02 | Stille erkannt (kein Sprachsignal) -> "Keine Sprache erkannt" statt Whisper-Halluzination | SATISFIED | Two-layer silence detection in Worker (RMS + hallucination filter), hook maps to toast.info, tested with 12 combined Worker + hook tests |
+
+### Anti-Patterns Found
+
+| File | Line | Pattern | Severity | Impact |
+|------|------|---------|----------|--------|
+| (none) | - | - | - | No anti-patterns found in any phase-modified files |
+
+### Human Verification Required
+
+### 1. Recording Timer Visual Behavior
+
+**Test:** Start recording in the chat with a 'transcribe-local' assistant. Observe the timer display.
+**Expected:** Timer appears showing "0:00 / 2:00", counts up smoothly each second, digits do not cause layout shift (tabular-nums), timer text turns red at 1:45 elapsed, auto-stop toast appears at 2:00 and timer disappears.
+**Why human:** Timer animation smoothness, color transition timing, and layout stability require visual confirmation in the running app.
+
+### 2. Privacy Badge Appearance and Tooltip
+
+**Test:** Navigate to a chat with 'transcribe-local' extension active. Observe the badge next to the mic button.
+**Expected:** Green shield icon with "Local" text visible. Hover shows tooltip "Audio is processed locally and never leaves your browser". Tab-navigate to badge and verify focus ring appears.
+**Why human:** Visual styling, icon rendering quality, tooltip positioning, and keyboard focus behavior cannot be verified programmatically.
+
+### 3. Silence Detection End-to-End
+
+**Test:** Start recording while staying silent for a few seconds, then stop recording.
+**Expected:** Toast appears with "No speech detected. Try speaking louder or closer to the microphone." (or German equivalent). No text is inserted into the chat input.
+**Why human:** End-to-end audio pipeline from real microphone through Worker RMS check to toast requires running the app with actual hardware.
+
+### 4. Normal Transcription Regression
+
+**Test:** Start recording, speak normally, stop recording.
+**Expected:** Transcribed text appears in the chat input field.
+**Why human:** Full pipeline regression check requires real audio input and Whisper model inference.
+
+### Gaps Summary
+
+No technical gaps found. All three roadmap success criteria are verified at the code level:
+
+1. **Recording timer** -- RecordingTimer component renders M:SS / 2:00 format, wired through hook elapsedSeconds to ChatInput, conditionally shown during recording, turns red in last 15 seconds. 8 passing tests.
+2. **Privacy badge** -- PrivacyBadge component renders shield icon + "Local" text with tooltip, always visible when local transcribe active, wired into ChatInput. 5 passing tests.
+3. **Silence detection** -- Two-layer detection in Worker (RMS energy + hallucination filter), silence status handled in hook with toast.info, full i18n in en/de. 12 passing tests across Worker and hook specs.
+
+All 176 frontend tests pass. All three commits verified. No anti-patterns, no stubs, no orphaned artifacts.
+
+4 items require human visual/interactive verification before the phase can be marked as fully passed.
+
+---
+
+_Verified: 2026-05-08T19:07:00Z_
+_Verifier: Claude (gsd-verifier)_
diff --git a/.planning/milestones/v1.0-phases/06-tech-debt-documentation-code-cleanup/06-01-PLAN.md b/.planning/milestones/v1.0-phases/06-tech-debt-documentation-code-cleanup/06-01-PLAN.md
new file mode 100644
index 000000000..e955e7d60
--- /dev/null
+++ b/.planning/milestones/v1.0-phases/06-tech-debt-documentation-code-cleanup/06-01-PLAN.md
@@ -0,0 +1,334 @@
+---
+phase: 06-tech-debt-documentation-code-cleanup
+plan: 01
+type: execute
+wave: 1
+depends_on: []
+files_modified:
+ - frontend/src/hooks/useLocalTranscribe.ts
+ - frontend/src/workers/whisper.worker.ts
+ - frontend/src/pages/chat/conversation/LocalTranscribeButton.tsx
+ - frontend/src/pages/chat/conversation/DownloadProgressBanner.tsx
+ - frontend/src/pages/chat/conversation/PrivacyBadge.tsx
+ - frontend/src/pages/chat/conversation/RecordingTimer.tsx
+ - frontend/src/lib/audio-utils.ts
+ - backend/src/extensions/other/local-transcribe.ts
+autonomous: true
+requirements:
+ - PHASE-06-SC1
+ - PHASE-06-SC2
+ - PHASE-06-SC3
+
+must_haves:
+ truths:
+ - "All planning reference suffixes (D-04, D-08, D-09, D-03, D-05, AUDIO-03) are removed from comments while explanatory text is preserved"
+ - "All ESLint and Prettier violations in local transcription files are resolved"
+ - "Exported types and component props interfaces have JSDoc comments following codebase minimal patterns"
+ - "No dead code, unused imports, or redundant abstractions remain in local transcription modules"
+ - "All 60 existing tests (55 frontend + 5 backend) continue to pass"
+ artifacts:
+ - path: "frontend/src/hooks/useLocalTranscribe.ts"
+ provides: "Main hook with clean comments and JSDoc on exported types"
+ contains: "export type LocalTranscribeState"
+ - path: "frontend/src/workers/whisper.worker.ts"
+ provides: "Worker with clean comments and fixed Prettier formatting"
+ contains: "TranscriberPipeline"
+ - path: "frontend/src/pages/chat/conversation/DownloadProgressBanner.tsx"
+ provides: "Component with fixed ESLint/Prettier violations"
+ contains: "DownloadProgressBanner"
+ key_links:
+ - from: "frontend/src/hooks/useLocalTranscribe.ts"
+ to: "frontend/src/workers/whisper.worker.ts"
+ via: "Worker message interface"
+ pattern: "type.*load.*transcribe"
+ - from: "frontend/src/pages/chat/conversation/LocalTranscribeButton.tsx"
+ to: "frontend/src/hooks/useLocalTranscribe.ts"
+ via: "exported types imported by component"
+ pattern: "import.*LocalTranscribeState.*from.*useLocalTranscribe"
+---
+
+
+Clean up all local transcription source files: remove planning reference suffixes from comments, fix ESLint/Prettier violations, add JSDoc to exported interfaces and types, and verify no dead code remains.
+
+Purpose: Improve code quality and maintainability by removing implementation-phase artifacts from the codebase and ensuring all files pass linting without violations.
+Output: 8 cleaned source files that pass all lint checks and existing tests.
+
+
+
+@/Users/thma/repos/c4-genai-suite/.claude/get-shit-done/workflows/execute-plan.md
+@/Users/thma/repos/c4-genai-suite/.claude/get-shit-done/templates/summary.md
+
+
+
+@.planning/PROJECT.md
+@.planning/ROADMAP.md
+@.planning/STATE.md
+@.planning/phases/06-tech-debt-documentation-code-cleanup/06-CONTEXT.md
+@.planning/codebase/CONVENTIONS.md
+
+
+
+
+From frontend/src/hooks/useLocalTranscribe.ts:
+```typescript
+export type LocalTranscribeState = 'idle' | 'downloading' | 'loading' | 'recording' | 'transcribing' | 'error';
+
+export interface DownloadProgress {
+ loaded: number;
+ total: number;
+ percentage: number;
+}
+
+interface UseLocalTranscribeProps {
+ language: string;
+ onTranscriptReceived: (transcript: string) => void;
+ maxDurationMs?: number;
+}
+
+// Return type (implicit):
+{
+ state: LocalTranscribeState;
+ downloadProgress: DownloadProgress | null;
+ isSupported: boolean;
+ isRecording: boolean;
+ isTranscribing: boolean;
+ isDownloading: boolean;
+ toggleRecording: () => Promise;
+ cancelDownload: () => void;
+ elapsedSeconds: number;
+}
+```
+
+From frontend/src/workers/whisper.worker.ts:
+```typescript
+interface WorkerMessageData {
+ type: 'load' | 'transcribe';
+ audio?: Float32Array;
+ language?: string;
+}
+```
+
+
+
+
+
+
+ Task 1: Remove planning references and fix lint violations across all source files
+
+ frontend/src/hooks/useLocalTranscribe.ts,
+ frontend/src/workers/whisper.worker.ts,
+ frontend/src/pages/chat/conversation/DownloadProgressBanner.tsx,
+ frontend/src/pages/chat/conversation/LocalTranscribeButton.tsx,
+ frontend/src/pages/chat/conversation/PrivacyBadge.tsx,
+ frontend/src/pages/chat/conversation/RecordingTimer.tsx,
+ frontend/src/lib/audio-utils.ts,
+ backend/src/extensions/other/local-transcribe.ts
+
+
+ frontend/src/hooks/useLocalTranscribe.ts,
+ frontend/src/workers/whisper.worker.ts,
+ frontend/src/pages/chat/conversation/DownloadProgressBanner.tsx,
+ frontend/src/pages/chat/conversation/PrivacyBadge.tsx,
+ .planning/codebase/CONVENTIONS.md
+
+
+ **Per D-03 — Planning reference removal.** Strip the parenthetical planning-phase suffixes from all comments. Preserve the explanatory text that precedes each suffix. Specific edits:
+
+ In `useLocalTranscribe.ts`:
+ - Line 159: `// Aggregate download progress (D-08)` -> `// Aggregate download progress`
+ - Line 181: `// User clicked record during download -- auto-start recording (D-04)` -> `// User clicked record during download -- auto-start recording`
+ - Line 277: `// Transfer audio to Worker with Transferable (zero-copy) (AUDIO-03)` -> `// Transfer audio to Worker with Transferable (zero-copy)`
+ - Line 322: `// Mic available -- trigger download and set pending (D-04)` -> `// Mic available -- trigger download and set pending`
+ - Line 340: `// Do nothing for 'downloading', 'loading', 'transcribing' (D-05)` -> `// Do nothing for 'downloading', 'loading', 'transcribing'`
+ - Line 343: `// Cancel an in-progress model download (D-03)` -> `// Cancel an in-progress model download`
+
+ In `whisper.worker.ts`:
+ - Line 147: `// Layer 1: RMS energy check (D-08)` -> `// Layer 1: RMS energy check`
+ - Line 162: `// Layer 2: Hallucination filter (D-09)` -> `// Layer 2: Hallucination filter`
+
+ In `DownloadProgressBanner.tsx`:
+ - Line 17: `// D-04: When download completes (isDownloading transitions to false), show "Ready!" briefly` -> `// When download completes (isDownloading transitions to false), show "Ready!" briefly`
+
+ **Lint/Prettier fixes (discovered via `cd frontend && npx eslint`):**
+
+ In `DownloadProgressBanner.tsx`:
+ - Fix import order: move `react` import after `@tabler/icons-react` (or reorder so external imports come first per `import/order` rule — actually React should come FIRST; the ESLint config likely expects external alphabetical order. Check the actual error: `react import should occur after import of @tabler/icons-react`. Fix: reorder to `{ ActionIcon, Progress } from '@mantine/core'`, then `{ IconX } from '@tabler/icons-react'`, then `{ useEffect, useState } from 'react'` — alphabetical by package name).
+ - Fix `react-hooks/set-state-in-effect` warning on line 20: The `setShowReady(true)` inside useEffect is flagged. Refactor to derive `showReady` from props instead of state: remove the `showReady` state, track the previous `isDownloading` value with a ref, and compute the "ready" state from the ref + current prop. Specifically:
+ ```typescript
+ const wasDownloadingRef = useRef(isDownloading);
+ const [dismissed, setDismissed] = useState(false);
+
+ const showReady = wasDownloadingRef.current && !isDownloading;
+
+ useEffect(() => {
+ wasDownloadingRef.current = isDownloading;
+ }, [isDownloading]);
+
+ useEffect(() => {
+ if (showReady) {
+ const timer = setTimeout(() => setDismissed(true), 1500);
+ return () => clearTimeout(timer);
+ }
+ }, [showReady]);
+
+ if (dismissed) return null;
+ ```
+ - Fix Prettier violations: collapse multi-line div attributes onto single line where Prettier expects it, reorder CSS classes in `whitespace-nowrap text-sm` to `text-sm whitespace-nowrap`.
+
+ In `PrivacyBadge.tsx`:
+ - Fix Prettier: collapse the multi-line span content `{texts.chat.localTranscribe.privacyBadge}` onto one line.
+
+ In `whisper.worker.ts`:
+ - Fix Prettier: wrap arrow function parameters in parentheses for `p => trimmed.toLowerCase()` and `w => w === words[0]` (use `(p)` and `(w)`).
+ - Fix Prettier: collapse multi-line ternary in error handling (lines 124-126) onto single line.
+
+ **Dead code / unused imports check:** Run `cd frontend && npx eslint --no-error-on-unmatched-pattern src/hooks/useLocalTranscribe.ts src/workers/whisper.worker.ts src/pages/chat/conversation/LocalTranscribeButton.tsx src/pages/chat/conversation/DownloadProgressBanner.tsx src/pages/chat/conversation/PrivacyBadge.tsx src/pages/chat/conversation/RecordingTimer.tsx src/lib/audio-utils.ts` after edits and fix any remaining violations. Also run `cd backend && npx eslint src/extensions/other/local-transcribe.ts` for the backend file.
+
+
+ cd /Users/thma/repos/c4-genai-suite/frontend && npx eslint src/hooks/useLocalTranscribe.ts src/workers/whisper.worker.ts src/pages/chat/conversation/LocalTranscribeButton.tsx src/pages/chat/conversation/DownloadProgressBanner.tsx src/pages/chat/conversation/PrivacyBadge.tsx src/pages/chat/conversation/RecordingTimer.tsx src/lib/audio-utils.ts && echo "ESLint: PASS"
+
+
+ - grep -c 'D-04\|D-05\|D-08\|D-09\|D-03\|AUDIO-03' frontend/src/hooks/useLocalTranscribe.ts returns 0
+ - grep -c 'D-08\|D-09' frontend/src/workers/whisper.worker.ts returns 0
+ - grep -c 'D-04' frontend/src/pages/chat/conversation/DownloadProgressBanner.tsx returns 0
+ - cd frontend && npx eslint src/hooks/useLocalTranscribe.ts src/workers/whisper.worker.ts src/pages/chat/conversation/LocalTranscribeButton.tsx src/pages/chat/conversation/DownloadProgressBanner.tsx src/pages/chat/conversation/PrivacyBadge.tsx src/pages/chat/conversation/RecordingTimer.tsx src/lib/audio-utils.ts exits 0
+ - cd backend && npx eslint src/extensions/other/local-transcribe.ts exits 0
+ - grep -c 'auto-start recording' frontend/src/hooks/useLocalTranscribe.ts returns 1 (explanatory text preserved)
+ - grep -c 'RMS energy check' frontend/src/workers/whisper.worker.ts returns 1 (explanatory text preserved)
+
+ All planning reference suffixes removed from 3 files (9 occurrences total). All Prettier/ESLint violations in local transcription files resolved. Backend extension file passes lint. Explanatory comment text preserved in every case.
+
+
+
+ Task 2: Add JSDoc to exported types and verify all tests pass
+
+ frontend/src/hooks/useLocalTranscribe.ts,
+ frontend/src/workers/whisper.worker.ts,
+ frontend/src/pages/chat/conversation/LocalTranscribeButton.tsx,
+ frontend/src/pages/chat/conversation/DownloadProgressBanner.tsx,
+ frontend/src/lib/audio-utils.ts
+
+
+ frontend/src/hooks/useLocalTranscribe.ts,
+ frontend/src/pages/chat/conversation/LocalTranscribeButton.tsx,
+ frontend/src/pages/chat/conversation/DownloadProgressBanner.tsx,
+ frontend/src/lib/audio-utils.ts,
+ backend/src/domain/extensions/interfaces.ts
+
+
+ **Per D-01 — JSDoc on exported types and component props, following the codebase pattern of minimal JSDoc on public interfaces only.**
+
+ Add JSDoc to exported types and interfaces in `useLocalTranscribe.ts`:
+ ```typescript
+ /** Represents the current state of the local transcription lifecycle. */
+ export type LocalTranscribeState = 'idle' | 'downloading' | 'loading' | 'recording' | 'transcribing' | 'error';
+
+ /** Tracks bytes loaded and total for the Whisper model download. */
+ export interface DownloadProgress {
+ loaded: number;
+ total: number;
+ percentage: number;
+ }
+
+ /** Configuration for the useLocalTranscribe hook. */
+ interface UseLocalTranscribeProps {
+ /** BCP 47 language code ('de' or 'en') passed to the Whisper worker. */
+ language: string;
+ /** Called with the transcribed text after successful transcription. */
+ onTranscriptReceived: (transcript: string) => void;
+ /** Maximum recording duration in milliseconds. Defaults to 2 minutes. */
+ maxDurationMs?: number;
+ }
+ ```
+
+ Add JSDoc to the exported function:
+ ```typescript
+ /**
+ * Hook that manages browser-based Whisper speech recognition.
+ * Handles model download, audio recording, and Worker-based transcription.
+ */
+ export function useLocalTranscribe(...)
+ ```
+
+ Add JSDoc to `LocalTranscribeButtonProps` in `LocalTranscribeButton.tsx`:
+ ```typescript
+ /** Props for the local transcription microphone button with language selector. */
+ interface LocalTranscribeButtonProps {
+ ```
+
+ Add JSDoc to `DownloadProgressBannerProps` in `DownloadProgressBanner.tsx`:
+ ```typescript
+ /** Props for the model download progress banner shown during first-time Whisper model download. */
+ interface DownloadProgressBannerProps {
+ ```
+
+ Add JSDoc to the exported function in `audio-utils.ts`:
+ ```typescript
+ /** Resamples an audio Blob to 16kHz mono Float32Array for Whisper inference. */
+ export async function resampleToMono16kHz(audioBlob: Blob): Promise {
+ ```
+
+ Add JSDoc to `WorkerMessageData` in `whisper.worker.ts`:
+ ```typescript
+ /** Message types accepted by the Whisper Web Worker. */
+ interface WorkerMessageData {
+ ```
+
+ **Per D-05/D-06 — Hook structure assessment (Claude's discretion):**
+ After reviewing `useLocalTranscribe.ts` (388 lines, 10 refs), the hook should be kept as a single unit. The 10 refs exist because the Worker message handler must have stable identity (no dependency array changes), which requires refs for all values it accesses. Extracting sub-hooks like `useWorkerMessages` or `useRecording` would force either:
+ (a) passing 8+ refs between hooks (moving complexity, not reducing it), or
+ (b) recreating the Worker message handler on every render (losing the stable identity that prevents re-attaching listeners).
+ The 4 separate ref-sync effects (lines 47-61) follow the codebase pattern of single-purpose effects and should remain separate per D-06 discretion.
+
+ No refactoring of hook structure. Document this decision as a comment at the top of the hook:
+ ```typescript
+ // Note: This hook intentionally uses refs for callback/prop synchronization
+ // to maintain a stable Worker message handler identity. See git history for rationale.
+ ```
+ Actually -- per codebase convention (comments explain WHY, not WHAT, and avoid over-commenting), do NOT add this comment. The ref pattern is standard React and self-explanatory. Just add the JSDoc on the exported function.
+
+ **Final verification:** Run all local transcription tests to confirm no regressions.
+
+
+ cd /Users/thma/repos/c4-genai-suite/frontend && npx vitest run src/hooks/useLocalTranscribe.ui-unit.spec.ts src/workers/whisper.worker.ui-unit.spec.ts src/pages/chat/conversation/LocalTranscribeButton.ui-unit.spec.tsx src/pages/chat/conversation/DownloadProgressBanner.ui-unit.spec.tsx src/pages/chat/conversation/PrivacyBadge.ui-unit.spec.tsx src/pages/chat/conversation/RecordingTimer.ui-unit.spec.tsx && echo "Frontend tests: PASS"
+
+
+ - grep -c '/\*\*' frontend/src/hooks/useLocalTranscribe.ts returns at least 4 (JSDoc on LocalTranscribeState, DownloadProgress, UseLocalTranscribeProps, useLocalTranscribe function)
+ - grep -c '/\*\*' frontend/src/pages/chat/conversation/LocalTranscribeButton.tsx returns at least 1
+ - grep -c '/\*\*' frontend/src/pages/chat/conversation/DownloadProgressBanner.tsx returns at least 1
+ - grep -c '/\*\*' frontend/src/lib/audio-utils.ts returns at least 1
+ - cd frontend && npx vitest run src/hooks/useLocalTranscribe.ui-unit.spec.ts src/workers/whisper.worker.ui-unit.spec.ts src/pages/chat/conversation/LocalTranscribeButton.ui-unit.spec.tsx src/pages/chat/conversation/DownloadProgressBanner.ui-unit.spec.tsx src/pages/chat/conversation/PrivacyBadge.ui-unit.spec.tsx src/pages/chat/conversation/RecordingTimer.ui-unit.spec.tsx exits 0
+ - cd backend && NODE_OPTIONS="$NODE_OPTIONS --experimental-vm-modules" npx jest --runInBand --forceExit src/extensions/other/local-transcribe.spec.ts exits 0
+
+ JSDoc added to all exported types (LocalTranscribeState, DownloadProgress, UseLocalTranscribeProps), all exported functions (useLocalTranscribe, resampleToMono16kHz), all component props interfaces (LocalTranscribeButtonProps, DownloadProgressBannerProps), and the WorkerMessageData interface. All 60 tests (55 frontend + 5 backend) pass without regressions. Hook structure assessed and kept intact per D-05/D-06 discretion.
+
+
+
+
+
+## Trust Boundaries
+
+No new trust boundaries introduced. This phase only modifies comments, formatting, and documentation in existing files. No behavioral changes.
+
+## STRIDE Threat Register
+
+| Threat ID | Category | Component | Disposition | Mitigation Plan |
+|-----------|----------|-----------|-------------|-----------------|
+| T-06-01 | T (Tampering) | Source files | accept | Changes are comment/formatting only; existing tests verify no behavioral regression. No new code paths introduced. |
+
+
+
+1. All ESLint/Prettier violations resolved: `cd frontend && npx eslint src/hooks/useLocalTranscribe.ts src/workers/whisper.worker.ts src/pages/chat/conversation/LocalTranscribeButton.tsx src/pages/chat/conversation/DownloadProgressBanner.tsx src/pages/chat/conversation/PrivacyBadge.tsx src/pages/chat/conversation/RecordingTimer.tsx src/lib/audio-utils.ts` exits 0
+2. No planning references remain: `grep -rn '(D-[0-9]\|AUDIO-[0-9]' frontend/src/hooks/useLocalTranscribe.ts frontend/src/workers/whisper.worker.ts frontend/src/pages/chat/conversation/DownloadProgressBanner.tsx` returns empty
+3. All 55 frontend tests pass: `cd frontend && npx vitest run src/hooks/useLocalTranscribe.ui-unit.spec.ts src/workers/whisper.worker.ui-unit.spec.ts src/pages/chat/conversation/LocalTranscribeButton.ui-unit.spec.tsx src/pages/chat/conversation/DownloadProgressBanner.ui-unit.spec.tsx src/pages/chat/conversation/PrivacyBadge.ui-unit.spec.tsx src/pages/chat/conversation/RecordingTimer.ui-unit.spec.tsx` exits 0
+4. All 5 backend tests pass: `cd backend && NODE_OPTIONS="$NODE_OPTIONS --experimental-vm-modules" npx jest --runInBand --forceExit src/extensions/other/local-transcribe.spec.ts` exits 0
+5. JSDoc present on exported types: `grep -c '/\*\*' frontend/src/hooks/useLocalTranscribe.ts` returns >= 4
+
+
+
+All 8 local transcription source files are clean: no planning reference suffixes, no lint violations, JSDoc on all exported public interfaces, all 60 existing tests pass unchanged.
+
+
+
diff --git a/.planning/milestones/v1.0-phases/06-tech-debt-documentation-code-cleanup/06-01-SUMMARY.md b/.planning/milestones/v1.0-phases/06-tech-debt-documentation-code-cleanup/06-01-SUMMARY.md
new file mode 100644
index 000000000..cc19b00e4
--- /dev/null
+++ b/.planning/milestones/v1.0-phases/06-tech-debt-documentation-code-cleanup/06-01-SUMMARY.md
@@ -0,0 +1,120 @@
+---
+phase: 06-tech-debt-documentation-code-cleanup
+plan: 01
+subsystem: frontend, backend
+tags: [eslint, prettier, jsdoc, code-cleanup, local-transcription, whisper, react-hooks]
+
+# Dependency graph
+requires:
+ - phase: 05-integration-testing-e2e-coverage
+ provides: "Fully tested local transcription feature with 84 frontend + 5 backend tests"
+provides:
+ - "Clean local transcription source files with no planning reference artifacts"
+ - "JSDoc documentation on all exported types, interfaces, and functions"
+ - "Zero ESLint/Prettier violations across all 8 local transcription files"
+affects: []
+
+# Tech tracking
+tech-stack:
+ added: []
+ patterns:
+ - "React render-phase state derivation pattern for detecting prop transitions without setState in effects"
+
+key-files:
+ created: []
+ modified:
+ - "frontend/src/hooks/useLocalTranscribe.ts"
+ - "frontend/src/workers/whisper.worker.ts"
+ - "frontend/src/pages/chat/conversation/DownloadProgressBanner.tsx"
+ - "frontend/src/pages/chat/conversation/PrivacyBadge.tsx"
+ - "frontend/src/pages/chat/conversation/LocalTranscribeButton.tsx"
+ - "frontend/src/lib/audio-utils.ts"
+
+key-decisions:
+ - "Used render-phase state derivation (setState during render from props) instead of ref-in-render pattern to fix react-hooks/set-state-in-effect ESLint violation while preserving synchronous test behavior"
+ - "Hook structure assessed and kept intact per D-05/D-06 discretion: 10 refs necessary for stable Worker message handler identity"
+
+patterns-established:
+ - "Render-phase state derivation: track prevProp in state, compute derived state synchronously during render to avoid setState in effects"
+
+requirements-completed: [PHASE-06-SC1, PHASE-06-SC2, PHASE-06-SC3]
+
+# Metrics
+duration: 6min
+completed: 2026-05-08
+---
+
+# Phase 06 Plan 01: Local Transcription Code Cleanup Summary
+
+**Removed 9 planning reference suffixes, fixed 8 ESLint/Prettier violations, and added JSDoc to all exported types and functions across 8 local transcription files**
+
+## Performance
+
+- **Duration:** 6 min
+- **Started:** 2026-05-08T19:20:21Z
+- **Completed:** 2026-05-08T19:26:16Z
+- **Tasks:** 2
+- **Files modified:** 6
+
+## Accomplishments
+- Stripped all 9 planning reference suffixes (D-04, D-05, D-08, D-09, D-03, AUDIO-03) from 3 source files while preserving explanatory comment text
+- Resolved all 8 ESLint/Prettier violations across DownloadProgressBanner.tsx (import order, set-state-in-effect, Prettier formatting), PrivacyBadge.tsx (Prettier), and whisper.worker.ts (Prettier arrow params, ternary)
+- Added JSDoc to 7 exported types/interfaces/functions: LocalTranscribeState, DownloadProgress, UseLocalTranscribeProps (with property docs), useLocalTranscribe, LocalTranscribeButtonProps, DownloadProgressBannerProps, resampleToMono16kHz, WorkerMessageData
+- All 84 frontend tests and 5 backend tests pass without regressions
+
+## Task Commits
+
+Each task was committed atomically:
+
+1. **Task 1: Remove planning references and fix lint violations** - `efd2724` (style)
+2. **Task 2: Add JSDoc to exported types and verify all tests pass** - `19bf054` (docs)
+
+## Files Created/Modified
+- `frontend/src/hooks/useLocalTranscribe.ts` - Removed 6 planning refs, added JSDoc to 4 types + hook function with property-level docs
+- `frontend/src/workers/whisper.worker.ts` - Removed 2 planning refs, fixed 3 Prettier violations, added JSDoc to WorkerMessageData
+- `frontend/src/pages/chat/conversation/DownloadProgressBanner.tsx` - Removed 1 planning ref, fixed import order + set-state-in-effect + Prettier, added JSDoc to props
+- `frontend/src/pages/chat/conversation/PrivacyBadge.tsx` - Fixed Prettier violation (collapsed multi-line span)
+- `frontend/src/pages/chat/conversation/LocalTranscribeButton.tsx` - Added JSDoc to props interface
+- `frontend/src/lib/audio-utils.ts` - Added JSDoc to resampleToMono16kHz function
+
+## Decisions Made
+- **DownloadProgressBanner refactor approach:** The plan suggested using a ref + derived value for detecting isDownloading transitions, but the `react-hooks/refs` ESLint rule forbids accessing refs during render. Used the React-recommended render-phase state derivation pattern (tracking previous prop value in state and calling setState during render) instead, which satisfies both `react-hooks/set-state-in-effect` and `react-hooks/refs` rules while preserving synchronous behavior for existing tests.
+- **Hook structure kept intact:** Assessed useLocalTranscribe.ts (388 lines, 10 refs) and confirmed the single-hook design is correct per D-05/D-06 discretion -- splitting would move complexity without reducing it due to the stable Worker message handler identity requirement.
+
+## Deviations from Plan
+
+### Auto-fixed Issues
+
+**1. [Rule 1 - Bug] DownloadProgressBanner ref-in-render approach incompatible with react-hooks/refs rule**
+- **Found during:** Task 1 (lint violation fixes)
+- **Issue:** Plan suggested using `wasDownloadingRef.current` in render to derive `showReady`, but the `react-hooks/refs` ESLint rule forbids accessing ref values during render
+- **Fix:** Used render-phase state derivation pattern: track `prevIsDownloading` in state, compute transition synchronously during render with `setState` calls that React batches, avoiding both `react-hooks/set-state-in-effect` and `react-hooks/refs` violations
+- **Files modified:** frontend/src/pages/chat/conversation/DownloadProgressBanner.tsx
+- **Verification:** ESLint passes with zero violations, all 6 DownloadProgressBanner tests pass including the "should show Ready text when download completes" test
+- **Committed in:** efd2724 (Task 1 commit)
+
+---
+
+**Total deviations:** 1 auto-fixed (1 bug)
+**Impact on plan:** Auto-fix necessary because plan's suggested approach violated an ESLint rule not anticipated in planning. Final implementation uses a more React-idiomatic pattern. No scope creep.
+
+## Issues Encountered
+None
+
+## User Setup Required
+None - no external service configuration required.
+
+## Next Phase Readiness
+- All 8 local transcription files are clean and fully documented
+- Ready for plan 06-02 (remaining tech debt items)
+
+## Self-Check: PASSED
+
+- All 6 modified source files exist on disk
+- SUMMARY.md created at `.planning/phases/06-tech-debt-documentation-code-cleanup/06-01-SUMMARY.md`
+- Commit efd2724 (Task 1) verified in git log
+- Commit 19bf054 (Task 2) verified in git log
+
+---
+*Phase: 06-tech-debt-documentation-code-cleanup*
+*Completed: 2026-05-08*
diff --git a/.planning/milestones/v1.0-phases/06-tech-debt-documentation-code-cleanup/06-02-PLAN.md b/.planning/milestones/v1.0-phases/06-tech-debt-documentation-code-cleanup/06-02-PLAN.md
new file mode 100644
index 000000000..5da25aa0b
--- /dev/null
+++ b/.planning/milestones/v1.0-phases/06-tech-debt-documentation-code-cleanup/06-02-PLAN.md
@@ -0,0 +1,177 @@
+---
+phase: 06-tech-debt-documentation-code-cleanup
+plan: 02
+type: execute
+wave: 1
+depends_on: []
+files_modified:
+ - .planning/PROJECT.md
+ - .planning/REQUIREMENTS.md
+autonomous: true
+requirements:
+ - PHASE-06-SC1
+
+must_haves:
+ truths:
+ - "PROJECT.md references whisper-small q8 (~240MB) everywhere instead of whisper-base (~140MB)"
+ - "REQUIREMENTS.md references whisper-small q8 (~240MB) instead of whisper-base (~140MB)"
+ - "The Key Decisions table in PROJECT.md reflects the actual model choice with rationale"
+ artifacts:
+ - path: ".planning/PROJECT.md"
+ provides: "Accurate project documentation matching shipped code"
+ contains: "whisper-small"
+ - path: ".planning/REQUIREMENTS.md"
+ provides: "Accurate requirements matching shipped code"
+ contains: "whisper-small"
+ key_links:
+ - from: ".planning/PROJECT.md"
+ to: "frontend/src/workers/whisper.worker.ts"
+ via: "model name consistency"
+ pattern: "whisper-small"
+---
+
+
+Update PROJECT.md and REQUIREMENTS.md to accurately reflect the shipped model: whisper-small q8 (~240MB) instead of whisper-base (~140MB).
+
+Purpose: Per D-04, the code uses `onnx-community/whisper-small` with `dtype: 'q8'` but documentation still references whisper-base (~140MB). Align all documentation to match the actual implementation.
+Output: Two updated planning documents with correct model references.
+
+
+
+@/Users/thma/repos/c4-genai-suite/.claude/get-shit-done/workflows/execute-plan.md
+@/Users/thma/repos/c4-genai-suite/.claude/get-shit-done/templates/summary.md
+
+
+
+@.planning/PROJECT.md
+@.planning/REQUIREMENTS.md
+@.planning/phases/06-tech-debt-documentation-code-cleanup/06-CONTEXT.md
+
+
+
+
+From frontend/src/workers/whisper.worker.ts line 80:
+```typescript
+this.instance ??= pipeline('automatic-speech-recognition', 'onnx-community/whisper-small', {
+ dtype: 'q8',
+ device,
+ progress_callback,
+});
+```
+
+
+
+
+
+
+ Task 1: Update PROJECT.md model references from whisper-base to whisper-small q8
+ .planning/PROJECT.md
+
+ .planning/PROJECT.md,
+ frontend/src/workers/whisper.worker.ts
+
+
+ **Per D-04 — Update all whisper-base references in PROJECT.md to match shipped code.**
+
+ There are 7 occurrences to update across 5 sections:
+
+ 1. **"What This Is" section (line 5):**
+ Change: `die Whisper (whisper-base) via Transformers.js`
+ To: `die Whisper (whisper-small, quantisiert q8) via Transformers.js`
+
+ 2. **"Requirements > Active" (line 19):**
+ Change: `Lokale Whisper-Inferenz im Browser via Transformers.js (whisper-base Modell)`
+ To: `Lokale Whisper-Inferenz im Browser via Transformers.js (whisper-small q8 Modell)`
+
+ 3. **"Requirements > Active" (line 22):**
+ Change: `On-Demand-Download des Whisper-Modells (~140MB)`
+ To: `On-Demand-Download des Whisper-Modells (~240MB)`
+
+ 4. **"Out of Scope" (line 33):**
+ Change: `fest auf whisper-base, ggf. später konfigurierbar`
+ To: `fest auf whisper-small q8, ggf. später konfigurierbar`
+
+ 5. **"Context" paragraph (line 49):**
+ Change: `Das whisper-base Modell ist ca. 140MB groß`
+ To: `Das whisper-small q8 Modell ist ca. 240MB groß`
+
+ 6. **"Constraints" (line 53):**
+ Change: `whisper-base ist ~140MB`
+ To: `whisper-small q8 ist ~240MB`
+
+ 7. **"Key Decisions" table (line 63):**
+ Change: `whisper-base statt whisper-tiny | Bessere Genauigkeit bei akzeptabler Modellgröße (~140MB vs ~75MB) | — Pending`
+ To: `whisper-small q8 statt whisper-base | Bessere Genauigkeit bei akzeptabler Modellgröße (~240MB vs ~140MB), q8 Quantisierung für reduzierte Dateigröße | Implemented`
+
+
+ grep -c 'whisper-base' /Users/thma/repos/c4-genai-suite/.planning/PROJECT.md && echo "FAIL: whisper-base still present" || echo "PASS: no whisper-base references"
+
+
+ - grep -c 'whisper-base' .planning/PROJECT.md returns 0
+ - grep -c 'whisper-small' .planning/PROJECT.md returns at least 5
+ - grep -c '~240MB\|240MB' .planning/PROJECT.md returns at least 3
+ - grep -c '~140MB' .planning/PROJECT.md returns 1 (only in the Key Decisions comparison column: "~240MB vs ~140MB")
+ - grep 'Implemented' .planning/PROJECT.md returns a line containing 'whisper-small q8'
+
+ All 7 occurrences of whisper-base/~140MB in PROJECT.md updated to whisper-small q8/~240MB. Key Decisions table updated with correct rationale and marked Implemented.
+
+
+
+ Task 2: Update REQUIREMENTS.md model references from whisper-base to whisper-small q8
+ .planning/REQUIREMENTS.md
+
+ .planning/REQUIREMENTS.md
+
+
+ **Per D-04 — Update whisper-base references in REQUIREMENTS.md.**
+
+ There are 2 occurrences to update:
+
+ 1. **MODEL-01 requirement (line 40):**
+ Change: `whisper-base Modell (~140MB) wird beim ersten Nutzen on-demand von Hugging Face Hub geladen`
+ To: `whisper-small q8 Modell (~240MB) wird beim ersten Nutzen on-demand von Hugging Face Hub geladen`
+
+ 2. **Out of Scope table (line 87):**
+ Change: `whisper-base ist der richtige Kompromiss`
+ To: `whisper-small q8 ist der richtige Kompromiss`
+
+
+ grep -c 'whisper-base' /Users/thma/repos/c4-genai-suite/.planning/REQUIREMENTS.md && echo "FAIL: whisper-base still present" || echo "PASS: no whisper-base references"
+
+
+ - grep -c 'whisper-base' .planning/REQUIREMENTS.md returns 0
+ - grep -c 'whisper-small q8' .planning/REQUIREMENTS.md returns at least 2
+ - grep -c '~240MB' .planning/REQUIREMENTS.md returns at least 1
+ - grep 'MODEL-01' .planning/REQUIREMENTS.md contains 'whisper-small q8'
+
+ Both occurrences of whisper-base in REQUIREMENTS.md updated to whisper-small q8. MODEL-01 now accurately describes the ~240MB model size. Out of Scope section reflects the correct model name.
+
+
+
+
+
+## Trust Boundaries
+
+No trust boundaries affected. This plan only modifies planning documentation files (.planning/), not application code.
+
+## STRIDE Threat Register
+
+| Threat ID | Category | Component | Disposition | Mitigation Plan |
+|-----------|----------|-----------|-------------|-----------------|
+| (none) | — | — | — | Documentation-only changes; no application code or runtime behavior affected. |
+
+
+
+1. No whisper-base references remain in PROJECT.md: `grep -c 'whisper-base' .planning/PROJECT.md` returns 0
+2. No whisper-base references remain in REQUIREMENTS.md: `grep -c 'whisper-base' .planning/REQUIREMENTS.md` returns 0
+3. Correct model name appears in both files: `grep -c 'whisper-small' .planning/PROJECT.md .planning/REQUIREMENTS.md` shows counts >= 5 and >= 2 respectively
+4. Documentation matches code: `grep 'whisper-small' frontend/src/workers/whisper.worker.ts` confirms the same model name
+
+
+
+PROJECT.md and REQUIREMENTS.md accurately reflect the shipped model (whisper-small q8, ~240MB) with zero remaining references to whisper-base or ~140MB (except in the Key Decisions comparison).
+
+
+
diff --git a/.planning/milestones/v1.0-phases/06-tech-debt-documentation-code-cleanup/06-02-SUMMARY.md b/.planning/milestones/v1.0-phases/06-tech-debt-documentation-code-cleanup/06-02-SUMMARY.md
new file mode 100644
index 000000000..45d7e0fb8
--- /dev/null
+++ b/.planning/milestones/v1.0-phases/06-tech-debt-documentation-code-cleanup/06-02-SUMMARY.md
@@ -0,0 +1,104 @@
+---
+phase: 06-tech-debt-documentation-code-cleanup
+plan: 02
+subsystem: docs
+tags: [whisper, documentation, tech-debt, model-accuracy]
+
+# Dependency graph
+requires:
+ - phase: 05-ux-polish-edge-cases
+ provides: "Shipped whisper-small q8 model implementation"
+provides:
+ - "Accurate PROJECT.md reflecting whisper-small q8 model"
+ - "Accurate REQUIREMENTS.md reflecting whisper-small q8 model"
+affects: []
+
+# Tech tracking
+tech-stack:
+ added: []
+ patterns: []
+
+key-files:
+ created: []
+ modified:
+ - ".planning/PROJECT.md"
+ - ".planning/REQUIREMENTS.md"
+
+key-decisions:
+ - "Kept 'whisper-base' in Key Decisions comparison context ('whisper-small q8 statt whisper-base') as it describes the decision rationale"
+
+patterns-established: []
+
+requirements-completed: [PHASE-06-SC1]
+
+# Metrics
+duration: 2min
+completed: 2026-05-08
+---
+
+# Phase 06 Plan 02: Documentation Model Reference Alignment Summary
+
+**Aligned PROJECT.md and REQUIREMENTS.md to match shipped whisper-small q8 model (~240MB) across all references**
+
+## Performance
+
+- **Duration:** 2 min
+- **Started:** 2026-05-08T19:17:47Z
+- **Completed:** 2026-05-08T19:19:58Z
+- **Tasks:** 2
+- **Files modified:** 2
+
+## Accomplishments
+- Updated 7 occurrences in PROJECT.md from whisper-base (~140MB) to whisper-small q8 (~240MB)
+- Updated 3 occurrences in REQUIREMENTS.md from whisper-base to whisper-small q8
+- Key Decisions table now marked as Implemented with correct rationale including q8 quantization
+
+## Task Commits
+
+Each task was committed atomically:
+
+1. **Task 1: Update PROJECT.md model references** - `84b1ddb` (docs)
+2. **Task 2: Update REQUIREMENTS.md model references** - `8ad0db1` (docs)
+
+## Files Created/Modified
+- `.planning/PROJECT.md` - Updated 7 whisper-base references to whisper-small q8 across What This Is, Requirements, Out of Scope, Context, Constraints, and Key Decisions sections
+- `.planning/REQUIREMENTS.md` - Updated MODEL-01 description, Out of Scope table, and Multi-Speaker Diarization row to reference whisper-small q8
+
+## Decisions Made
+- Kept "whisper-base" in Key Decisions comparison context ("whisper-small q8 statt whisper-base") since it describes what the new choice replaced -- this is accurate historical context, not a stale reference
+
+## Deviations from Plan
+
+### Auto-fixed Issues
+
+**1. [Rule 1 - Bug] Fixed additional Whisper-base reference in REQUIREMENTS.md**
+- **Found during:** Task 2 (Update REQUIREMENTS.md)
+- **Issue:** Plan specified 2 occurrences to update, but a third case-sensitive "Whisper-base" existed on line 91 in the Multi-Speaker Diarization Out of Scope row
+- **Fix:** Updated "Whisper-base" to "Whisper-small" on line 91
+- **Files modified:** .planning/REQUIREMENTS.md
+- **Verification:** `grep -in 'whisper-base' .planning/REQUIREMENTS.md` returns no results
+- **Committed in:** 8ad0db1 (Task 2 commit)
+
+---
+
+**Total deviations:** 1 auto-fixed (1 bug fix)
+**Impact on plan:** Essential for complete documentation accuracy. No scope creep.
+
+## Issues Encountered
+None
+
+## User Setup Required
+None - no external service configuration required.
+
+## Next Phase Readiness
+- All planning documentation now accurately reflects the shipped whisper-small q8 model
+- No blockers for subsequent work
+
+## Self-Check: PASSED
+
+- All files exist on disk
+- All commit hashes verified in git log
+
+---
+*Phase: 06-tech-debt-documentation-code-cleanup*
+*Completed: 2026-05-08*
diff --git a/.planning/milestones/v1.0-phases/06-tech-debt-documentation-code-cleanup/06-CONTEXT.md b/.planning/milestones/v1.0-phases/06-tech-debt-documentation-code-cleanup/06-CONTEXT.md
new file mode 100644
index 000000000..9c91dd99e
--- /dev/null
+++ b/.planning/milestones/v1.0-phases/06-tech-debt-documentation-code-cleanup/06-CONTEXT.md
@@ -0,0 +1,114 @@
+# Phase 6: Address Tech Debt: Documentation and Code Cleanup - Context
+
+**Gathered:** 2026-05-08
+**Status:** Ready for planning
+
+
+## Phase Boundary
+
+This phase improves code quality and maintainability of the local transcription feature (8 files, ~806 lines). It covers: adding appropriate JSDoc/module documentation, cleaning up planning-reference comments, resolving the whisper-base vs whisper-small documentation discrepancy, and assessing hook structure for potential refactoring. No new features, no behavioral changes.
+
+
+
+
+## Implementation Decisions
+
+### Documentation Style
+- **D-01:** Documentation level is **Claude's discretion**, following existing codebase patterns. The project convention is minimal JSDoc (only on public interfaces per Extension interface pattern), no over-commenting. Apply JSDoc to exported types (`LocalTranscribeState`, `DownloadProgress`, `UseLocalTranscribeProps`) and component props interfaces. No feature-level README — code should be self-documenting.
+- **D-02:** No standalone feature README or architecture document. The code should speak for itself through naming and structure.
+
+### Planning Reference Cleanup
+- **D-03:** **Remove decision references (D-04, D-08, AUDIO-03), keep the explanatory text.** Strip planning-phase suffixes like `(D-04)`, `(D-08)`, `(AUDIO-03)` from comments but preserve the intent-explaining text. E.g., `// auto-start recording after download` stays, `(D-04)` goes. Planning refs belong in commit history, not code.
+
+### Model Name Discrepancy
+- **D-04:** **Update PROJECT.md and REQUIREMENTS.md to reflect the actual model: whisper-small q8 (~240MB).** The code uses `onnx-community/whisper-small` with `dtype: 'q8'`, but docs still say whisper-base (~140MB). Align all documentation to match the shipped code. Document the rationale for the change.
+
+### Hook Structure
+- **D-05:** Whether to extract sub-hooks from `useLocalTranscribe` (388 lines, 10 refs) is **Claude's discretion.** Assess whether splitting genuinely improves clarity or just moves complexity around. The hook is tightly coupled — Worker triggers recording, recording feeds Worker — so splitting may not simplify anything.
+- **D-06:** Whether to consolidate the 4 ref-sync `useEffect` blocks into one or leave them separate is **Claude's discretion.** Follow whichever approach best matches existing codebase patterns.
+
+### Claude's Discretion
+- Documentation level matching existing codebase patterns (D-01)
+- Hook refactoring decision — extract sub-hooks or keep as one unit (D-05)
+- Ref-sync effect consolidation (D-06)
+- Identification and removal of any dead code, unused imports, or redundant abstractions discovered during cleanup
+- Ensuring consistent patterns across all local transcription modules
+
+
+
+
+## Canonical References
+
+**Downstream agents MUST read these before planning or implementing.**
+
+### Local Transcription Source Files (modify)
+- `frontend/src/hooks/useLocalTranscribe.ts` — Main hook (388 lines). Primary cleanup target for comments and potential refactoring.
+- `frontend/src/workers/whisper.worker.ts` — Web Worker (177 lines). Contains model reference (`onnx-community/whisper-small`), silence detection, hallucination filter.
+- `frontend/src/pages/chat/conversation/LocalTranscribeButton.tsx` — Button component (92 lines).
+- `frontend/src/pages/chat/conversation/DownloadProgressBanner.tsx` — Download progress UI (65 lines).
+- `frontend/src/pages/chat/conversation/PrivacyBadge.tsx` — Privacy indicator (18 lines).
+- `frontend/src/pages/chat/conversation/RecordingTimer.tsx` — Recording timer (28 lines).
+- `frontend/src/lib/audio-utils.ts` — Audio resampling utility (21 lines).
+- `backend/src/extensions/other/local-transcribe.ts` — Backend extension registration (38 lines).
+
+### Integration Points (read-only, check for consistency)
+- `frontend/src/pages/chat/conversation/ChatInput.tsx` §188-349 — Integration point wiring all local transcription components.
+
+### Project Documentation (modify — model name fix)
+- `.planning/PROJECT.md` — Says "whisper-base (~140MB)" throughout. Must be updated to "whisper-small q8 (~240MB)".
+- `.planning/REQUIREMENTS.md` — References "whisper-base (~140MB)". Must be updated.
+
+### Codebase Conventions (read-only)
+- `.planning/codebase/CONVENTIONS.md` — Coding conventions, JSDoc guidelines, comment policy. Reference for documentation decisions.
+
+### Test Files (read-only, verify no breakage)
+- `frontend/src/hooks/useLocalTranscribe.ui-unit.spec.ts`
+- `frontend/src/workers/whisper.worker.ui-unit.spec.ts`
+- `frontend/src/pages/chat/conversation/LocalTranscribeButton.ui-unit.spec.tsx`
+- `frontend/src/pages/chat/conversation/DownloadProgressBanner.ui-unit.spec.tsx`
+- `frontend/src/pages/chat/conversation/PrivacyBadge.ui-unit.spec.tsx`
+- `frontend/src/pages/chat/conversation/RecordingTimer.ui-unit.spec.tsx`
+- `backend/src/extensions/other/local-transcribe.spec.ts`
+
+
+
+
+## Existing Code Insights
+
+### Reusable Assets
+- ESLint with `no-warning-comments: error` — enforces no TODOs/FIXMEs, already clean.
+- `knip.json` in frontend — dead code detection tool already configured.
+- Prettier formatting already enforced via lint-staged pre-commit hooks.
+
+### Established Patterns
+- JSDoc limited to public API interfaces (Extension interface pattern in `src/domain/extensions/interfaces.ts`).
+- Comments explain WHY, not WHAT. Non-obvious error handling and workarounds documented.
+- Separate ref-sync effects are idiomatic in the codebase (each effect is clear about its purpose).
+- Components follow Mantine + Tailwind composition pattern consistently.
+
+### Integration Points
+- ChatInput.tsx lines 188-349 wire all local transcription components together. Any interface changes during refactoring must maintain compatibility.
+- Test files import types from the source files — exported type changes need test updates.
+
+
+
+
+## Specific Ideas
+
+- Decision-reference comments follow a clear pattern: `// explanatory text (D-XX)` or `// explanatory text (AUDIO-XX)`. A systematic find-and-replace can strip the parenthetical suffixes.
+- PROJECT.md model references appear in: "What This Is", Requirements, Constraints, and Key Decisions sections — all need updating from whisper-base to whisper-small q8.
+- The hook's 10 refs serve a specific purpose (stable callback identity for Worker message handler). Before refactoring, verify that extraction wouldn't just move refs between hooks without reducing complexity.
+
+
+
+
+## Deferred Ideas
+
+None — discussion stayed within phase scope
+
+
+
+---
+
+*Phase: 6-Address Tech Debt: Documentation and Code Cleanup*
+*Context gathered: 2026-05-08*
diff --git a/.planning/milestones/v1.0-phases/06-tech-debt-documentation-code-cleanup/06-DISCUSSION-LOG.md b/.planning/milestones/v1.0-phases/06-tech-debt-documentation-code-cleanup/06-DISCUSSION-LOG.md
new file mode 100644
index 000000000..2eadca59c
--- /dev/null
+++ b/.planning/milestones/v1.0-phases/06-tech-debt-documentation-code-cleanup/06-DISCUSSION-LOG.md
@@ -0,0 +1,104 @@
+# Phase 6: Address Tech Debt: Documentation and Code Cleanup - Discussion Log
+
+> **Audit trail only.** Do not use as input to planning, research, or execution agents.
+> Decisions are captured in CONTEXT.md — this log preserves the alternatives considered.
+
+**Date:** 2026-05-08
+**Phase:** 06-tech-debt-documentation-code-cleanup
+**Areas discussed:** Documentation depth, Planning reference cleanup, Model name discrepancy, Hook structure
+
+---
+
+## Documentation Depth
+
+### Question 1: Documentation level
+
+| Option | Description | Selected |
+|--------|-------------|----------|
+| JSDoc on exports only | Add brief JSDoc to exported types, hook params, and component props. Nothing on internals. | |
+| JSDoc + module headers | JSDoc on exports PLUS 1-2 line module-level comment at top of each file. | |
+| You decide | Claude picks the level that matches existing codebase patterns best. | ✓ |
+
+**User's choice:** You decide
+**Notes:** Claude will follow codebase conventions (minimal JSDoc on public interfaces).
+
+### Question 2: Feature README
+
+| Option | Description | Selected |
+|--------|-------------|----------|
+| No feature README | Code should be self-documenting. Extension system is well-established. | ✓ |
+| Brief architecture doc | Short markdown covering Worker→Hook→Component data flow. | |
+| You decide | Claude decides based on how self-explanatory the code already is. | |
+
+**User's choice:** No feature README
+**Notes:** None.
+
+---
+
+## Planning Reference Cleanup
+
+### Question 1: Decision references in comments
+
+| Option | Description | Selected |
+|--------|-------------|----------|
+| Remove references, keep intent | Strip (D-04), (AUDIO-03) suffixes but keep explanatory text. | ✓ |
+| Remove all planning comments | Strip both references AND explanatory text. | |
+| Keep as-is | Leave all comments including decision references. | |
+
+**User's choice:** Remove references, keep intent
+**Notes:** Planning refs belong in commit history, not code. Explanatory text stays.
+
+---
+
+## Model Name Discrepancy
+
+### Question 1: whisper-base vs whisper-small
+
+| Option | Description | Selected |
+|--------|-------------|----------|
+| Change code to whisper-base | Align code to original spec. Switch to whisper-base (~140MB). | |
+| Update docs to whisper-small | Code is correct. Update PROJECT.md and REQUIREMENTS.md to reflect whisper-small q8 (~240MB). | ✓ |
+| Defer model decision | Mark as known discrepancy, don't change model in this phase. | |
+
+**User's choice:** Update docs to whisper-small
+**Notes:** Code ships whisper-small q8. Documentation must match reality.
+
+---
+
+## Hook Structure
+
+### Question 1: Sub-hook extraction
+
+| Option | Description | Selected |
+|--------|-------------|----------|
+| Extract sub-hooks | Split into useWhisperWorker + useMediaRecording + useLocalTranscribe orchestrator. | |
+| Keep as one hook | Hook is complex but cohesive. Splitting may just move complexity around. | |
+| You decide | Claude assesses whether splitting genuinely improves clarity. | ✓ |
+
+**User's choice:** You decide
+**Notes:** Claude will assess based on coupling analysis.
+
+### Question 2: Ref-sync effects
+
+| Option | Description | Selected |
+|--------|-------------|----------|
+| Consolidate to one effect | Merge 4 ref-sync effects into single useEffect. | |
+| Leave separate | Each effect is clear about what it syncs. Idiomatic React. | |
+| You decide | Claude picks the approach matching existing codebase patterns. | ✓ |
+
+**User's choice:** You decide
+**Notes:** Claude will follow codebase conventions.
+
+---
+
+## Claude's Discretion
+
+- Documentation level — match existing codebase JSDoc patterns (D-01)
+- Hook refactoring — assess whether extraction genuinely improves clarity (D-05)
+- Ref-sync effect consolidation — follow existing codebase patterns (D-06)
+- Dead code identification and removal
+- Pattern consistency enforcement
+
+## Deferred Ideas
+
+None — discussion stayed within phase scope
diff --git a/.planning/milestones/v1.0-phases/06-tech-debt-documentation-code-cleanup/06-REVIEW-FIX.md b/.planning/milestones/v1.0-phases/06-tech-debt-documentation-code-cleanup/06-REVIEW-FIX.md
new file mode 100644
index 000000000..41dc80eae
--- /dev/null
+++ b/.planning/milestones/v1.0-phases/06-tech-debt-documentation-code-cleanup/06-REVIEW-FIX.md
@@ -0,0 +1,65 @@
+---
+phase: 06-tech-debt-documentation-code-cleanup
+fixed_at: 2026-05-08T21:50:00Z
+review_path: .planning/phases/06-tech-debt-documentation-code-cleanup/06-REVIEW.md
+iteration: 1
+findings_in_scope: 6
+fixed: 6
+skipped: 0
+status: all_fixed
+---
+
+# Phase 6: Code Review Fix Report
+
+**Fixed at:** 2026-05-08T21:50:00Z
+**Source review:** .planning/phases/06-tech-debt-documentation-code-cleanup/06-REVIEW.md
+**Iteration:** 1
+
+**Summary:**
+- Findings in scope: 6
+- Fixed: 6
+- Skipped: 0
+
+## Fixed Issues
+
+### CR-01: Null dereference crash when worker is null during transcription send
+
+**Files modified:** `frontend/src/hooks/useLocalTranscribe.ts`
+**Commit:** 311b1e7
+**Applied fix:** Replaced non-null assertion `workerRef.current!.postMessage(...)` with a null guard that checks `workerRef.current` before calling `postMessage`. When worker is null, the handler resets state to 'idle' and resolves the promise to prevent hangs.
+
+### CR-02: Promise never resolves when MediaRecorder.state diverges from hook state
+
+**Files modified:** `frontend/src/hooks/useLocalTranscribe.ts`
+**Commit:** 930839f
+**Applied fix:** Added an `else` branch to the `recorder.state === 'recording'` check that calls `cleanup()`, sets state to 'idle', and resolves the promise immediately when the MediaRecorder is already inactive. This prevents the promise from hanging indefinitely.
+
+### WR-01: Division by zero in computeRMS produces NaN
+
+**Files modified:** `frontend/src/workers/whisper.worker.ts`
+**Commit:** 9ff7e4e
+**Applied fix:** Added `if (samples.length === 0) return 0;` guard at the top of `computeRMS` to prevent division by zero when an empty `Float32Array` is received.
+
+### WR-02: User stuck in 'downloading' state if workerRef is null
+
+**Files modified:** `frontend/src/hooks/useLocalTranscribe.ts`
+**Commit:** 625c0ff
+**Applied fix:** Added null guard for `workerRef.current` before sending the 'load' message. When worker is null, resets `pendingRecordRef` to false and state to 'idle' to prevent the user from being stuck in the downloading state. Omitted the `toast.error` call from the review suggestion since the `loadFailed` i18n key may not exist.
+
+### WR-03: RecordingTimer warning threshold is negative when maxSeconds < 15
+
+**Files modified:** `frontend/src/pages/chat/conversation/RecordingTimer.tsx`
+**Commit:** cc9cac7
+**Applied fix:** Changed `const WARNING_THRESHOLD = maxSeconds - 15` to `const WARNING_THRESHOLD = Math.max(0, maxSeconds - 15)` to clamp the threshold to zero when `maxSeconds` is less than 15.
+
+### WR-04: Unmount cleanup calls cleanup() before stopping MediaRecorder
+
+**Files modified:** `frontend/src/hooks/useLocalTranscribe.ts`
+**Commit:** ce70d82
+**Applied fix:** Reordered the unmount cleanup effect to stop the MediaRecorder before calling `cleanup()`. This ensures proper event ordering -- the recorder stops first (allowing any pending `ondataavailable` events to fire with valid `audioChunksRef`), then cleanup releases the stream and resets refs.
+
+---
+
+_Fixed: 2026-05-08T21:50:00Z_
+_Fixer: Claude (gsd-code-fixer)_
+_Iteration: 1_
diff --git a/.planning/milestones/v1.0-phases/06-tech-debt-documentation-code-cleanup/06-REVIEW.md b/.planning/milestones/v1.0-phases/06-tech-debt-documentation-code-cleanup/06-REVIEW.md
new file mode 100644
index 000000000..baebf9a63
--- /dev/null
+++ b/.planning/milestones/v1.0-phases/06-tech-debt-documentation-code-cleanup/06-REVIEW.md
@@ -0,0 +1,147 @@
+---
+phase: 06-tech-debt-documentation-code-cleanup
+reviewed: 2026-05-08T12:00:00Z
+depth: standard
+files_reviewed: 8
+files_reviewed_list:
+ - frontend/src/hooks/useLocalTranscribe.ts
+ - frontend/src/workers/whisper.worker.ts
+ - frontend/src/pages/chat/conversation/LocalTranscribeButton.tsx
+ - frontend/src/pages/chat/conversation/DownloadProgressBanner.tsx
+ - frontend/src/pages/chat/conversation/PrivacyBadge.tsx
+ - frontend/src/pages/chat/conversation/RecordingTimer.tsx
+ - frontend/src/lib/audio-utils.ts
+ - backend/src/extensions/other/local-transcribe.ts
+findings:
+ critical: 2
+ warning: 4
+ info: 0
+ total: 6
+status: fixed
+---
+
+# Phase 6: Code Review Report
+
+**Reviewed:** 2026-05-08T12:00:00Z
+**Depth:** standard
+**Files Reviewed:** 8
+**Status:** fixed
+
+## Summary
+
+Reviewed eight files implementing the local (browser-side) Whisper transcription feature: a React hook managing the full lifecycle, a Web Worker for model inference, four UI components, a resampling utility, and a backend extension registration. The code is generally well-structured with good JSDoc coverage and sensible state management via refs. However, two crash-causing bugs were found around worker null dereferences, plus several robustness issues that could cause hangs or unexpected behavior.
+
+## Critical Issues
+
+### CR-01: Null dereference crash when worker is null during transcription send
+
+**File:** `frontend/src/hooks/useLocalTranscribe.ts:288`
+**Issue:** The `recorder.onstop` callback at line 266 uses a non-null assertion `workerRef.current!.postMessage(...)` at line 288. This crashes with `TypeError: Cannot read properties of null` in at least two scenarios:
+
+1. **Component unmount during recording:** React cleanup effects run in reverse registration order. The MediaRecorder cleanup effect (line 378) fires first, calling `recorder.stop()` which schedules the `onstop` callback. Then the worker cleanup effect (line 250) fires, setting `workerRef.current = null` and terminating the worker. When the `onstop` callback finally fires (asynchronously), `workerRef.current` is already null.
+
+2. **Theoretical race if `cancelDownload` is somehow called while state tracking is inconsistent.**
+
+**Fix:**
+```typescript
+// Line 288: Replace non-null assertion with a guard
+const worker = workerRef.current;
+if (!worker) {
+ setState('idle');
+ resolve();
+ return;
+}
+worker.postMessage(
+ { type: 'transcribe', audio: audioData, language: languageRef.current },
+ [audioData.buffer],
+);
+```
+
+### CR-02: Promise never resolves when MediaRecorder.state diverges from hook state
+
+**File:** `frontend/src/hooks/useLocalTranscribe.ts:300-303`
+**Issue:** `stopRecording` sets up the `recorder.onstop` handler (line 266) and then conditionally calls `recorder.stop()` only if `recorder.state === 'recording'` (line 300). If the browser's MediaRecorder has already transitioned to `'inactive'` or `'paused'` (due to a browser error, track ending, or timing race) while `stateRef.current` is still `'recording'`, then `recorder.stop()` is never called. The `onstop` event never fires, and the returned Promise never resolves. This silently hangs the `toggleRecording` call and leaves the UI stuck in the `'recording'` state indefinitely.
+
+**Fix:**
+```typescript
+// After line 303, add a fallback resolution:
+if (recorder.state === 'recording') {
+ recorder.requestData();
+ recorder.stop();
+} else {
+ // MediaRecorder is already inactive -- resolve immediately
+ cleanup();
+ setState('idle');
+ resolve();
+}
+```
+
+## Warnings
+
+### WR-01: Division by zero in computeRMS produces NaN, bypassing silence detection
+
+**File:** `frontend/src/workers/whisper.worker.ts:48-54`
+**Issue:** `computeRMS` divides by `samples.length` without checking for zero length. If a zero-length `Float32Array` is received (e.g., from an extremely short recording that rounds to 0 samples during resampling), the result is `NaN`. On line 147, `NaN < SILENCE_RMS_THRESHOLD` evaluates to `false`, so silence detection is bypassed and the invalid audio is sent to the Whisper model, which could produce garbage output or throw.
+
+**Fix:**
+```typescript
+function computeRMS(samples: Float32Array): number {
+ if (samples.length === 0) return 0;
+ let sumSquares = 0;
+ for (let i = 0; i < samples.length; i++) {
+ sumSquares += samples[i] * samples[i];
+ }
+ return Math.sqrt(sumSquares / samples.length);
+}
+```
+
+### WR-02: User stuck in 'downloading' state if workerRef is null when load is triggered
+
+**File:** `frontend/src/hooks/useLocalTranscribe.ts:333-336`
+**Issue:** When `startRecording` triggers a model download, line 333 sets `pendingRecordRef.current = true` and line 334 sets state to `'downloading'`, then line 335 uses optional chaining (`workerRef.current?.postMessage`) to send the load message. If `workerRef.current` is null (e.g., due to a timing issue during effect cleanup or if `isSupported` check was bypassed by a parent), the load message is silently dropped but the state remains `'downloading'` with no way for the user to recover -- `cancelDownload` terminates a null worker and creates a new one, but the `pendingRecordRef` remains true and `modelLoadedRef` false.
+
+**Fix:**
+```typescript
+const worker = workerRef.current;
+if (!worker) {
+ pendingRecordRef.current = false;
+ setState('idle');
+ toast.error(texts.chat.localTranscribe.loadFailed);
+ return;
+}
+pendingRecordRef.current = true;
+setState('downloading');
+worker.postMessage({ type: 'load' });
+```
+
+### WR-03: RecordingTimer warning threshold is negative when maxSeconds < 15
+
+**File:** `frontend/src/pages/chat/conversation/RecordingTimer.tsx:9`
+**Issue:** `WARNING_THRESHOLD = maxSeconds - 15` produces a negative value when `maxSeconds < 15`. This causes `isWarning` to be `true` from the very start of recording, showing the timer in red the entire time. While the default `maxDurationMs` (2 minutes = 120 seconds) avoids this, the component accepts arbitrary `maxSeconds` and should handle small values gracefully.
+
+**Fix:**
+```typescript
+const WARNING_THRESHOLD = Math.max(0, maxSeconds - 15);
+```
+
+### WR-04: Unmount cleanup calls cleanup() before stopping MediaRecorder, causing lost audio chunks
+
+**File:** `frontend/src/hooks/useLocalTranscribe.ts:378-385`
+**Issue:** The unmount cleanup effect calls `cleanup()` on line 380 which resets `audioChunksRef.current = []` (line 83), then calls `mediaRecorderRef.current.stop()` on line 382. This ordering means any `ondataavailable` events that fire between `cleanup()` and `stop()` would push into the already-cleared array. More importantly, `cleanup()` stops all media tracks first (line 76), which may cause the MediaRecorder to transition to `'inactive'` before `stop()` is called on line 382. When the `onstop` handler (from a previous `stopRecording` call still pending) fires, `audioChunksRef.current` is empty, triggering the "no audio recorded" error path instead of gracefully ignoring the unmount.
+
+**Fix:**
+```typescript
+return () => {
+ // Stop recorder BEFORE cleanup to preserve proper event ordering
+ if (mediaRecorderRef.current && mediaRecorderRef.current.state === 'recording') {
+ mediaRecorderRef.current.stop();
+ }
+ cleanup();
+};
+```
+
+---
+
+_Reviewed: 2026-05-08T12:00:00Z_
+_Reviewer: Claude (gsd-code-reviewer)_
+_Depth: standard_
diff --git a/.planning/milestones/v1.0-phases/06-tech-debt-documentation-code-cleanup/06-VERIFICATION.md b/.planning/milestones/v1.0-phases/06-tech-debt-documentation-code-cleanup/06-VERIFICATION.md
new file mode 100644
index 000000000..3429e6047
--- /dev/null
+++ b/.planning/milestones/v1.0-phases/06-tech-debt-documentation-code-cleanup/06-VERIFICATION.md
@@ -0,0 +1,97 @@
+---
+phase: 06-tech-debt-documentation-code-cleanup
+verified: 2026-05-08T19:43:44Z
+status: passed
+score: 8/8 must-haves verified
+overrides_applied: 0
+---
+
+# Phase 6: Address Tech Debt: Documentation and Code Cleanup Verification Report
+
+**Phase Goal:** Improve code quality and maintainability of the local transcription feature through documentation improvements and code cleanup
+**Verified:** 2026-05-08T19:43:44Z
+**Status:** passed
+**Re-verification:** No -- initial verification
+
+## Goal Achievement
+
+### Observable Truths
+
+| # | Truth | Status | Evidence |
+|---|-------|--------|----------|
+| 1 | All planning reference suffixes (D-04, D-08, D-09, D-03, D-05, AUDIO-03) are removed from comments while explanatory text is preserved | VERIFIED | `grep -rn 'D-0[0-9]\|AUDIO-0[0-9]'` across all 8 files returns zero matches. Explanatory text confirmed preserved: "auto-start recording" (1 match), "RMS energy check" (1 match), "Hallucination filter" (1 match), "Aggregate download progress" (1 match). |
+| 2 | All ESLint and Prettier violations in local transcription files are resolved | VERIFIED | `npx eslint` exits 0 for all 7 frontend files. `npx eslint` exits 0 for backend file. `npx prettier --check` exits 0 for all 7 frontend files. |
+| 3 | Exported types and component props interfaces have JSDoc comments following codebase minimal patterns | VERIFIED | 7 JSDoc comments in useLocalTranscribe.ts (LocalTranscribeState, DownloadProgress, UseLocalTranscribeProps with 3 property docs, useLocalTranscribe function). 1 each in LocalTranscribeButton.tsx (LocalTranscribeButtonProps), DownloadProgressBanner.tsx (DownloadProgressBannerProps), audio-utils.ts (resampleToMono16kHz), whisper.worker.ts (WorkerMessageData). |
+| 4 | No dead code, unused imports, or redundant abstractions remain in local transcription modules | VERIFIED | ESLint passes with project config (which includes unused-imports rules). No TODO/FIXME/PLACEHOLDER patterns found in any file. |
+| 5 | All existing tests continue to pass | VERIFIED | 91 frontend tests pass across 7 test files (vitest exit 0). 5 backend tests pass (jest exit 0). Total: 96 tests, zero failures. |
+| 6 | PROJECT.md references whisper-small q8 (~240MB) everywhere instead of whisper-base (~140MB) | VERIFIED | `grep -c 'whisper-small' PROJECT.md` returns 6. `grep -in 'whisper-base' PROJECT.md` returns exactly 1 match in Key Decisions comparison context ("whisper-small q8 statt whisper-base") which is correct historical reference. `grep -c '~240MB' PROJECT.md` returns 4. Key Decisions row marked "Implemented". |
+| 7 | REQUIREMENTS.md references whisper-small q8 (~240MB) instead of whisper-base (~140MB) | VERIFIED | `grep -in 'whisper-base' REQUIREMENTS.md` returns zero matches. `grep -c 'whisper-small' REQUIREMENTS.md` returns 2. MODEL-01 correctly reads "whisper-small q8 Modell (~240MB)". |
+| 8 | The Key Decisions table in PROJECT.md reflects the actual model choice with rationale | VERIFIED | Row reads: "whisper-small q8 statt whisper-base \| Bessere Genauigkeit bei akzeptabler Modellgroesse (~240MB vs ~140MB), q8 Quantisierung fuer reduzierte Dateigroesse \| Implemented". |
+
+**Score:** 8/8 truths verified
+
+### Required Artifacts
+
+| Artifact | Expected | Status | Details |
+|----------|----------|--------|---------|
+| `frontend/src/hooks/useLocalTranscribe.ts` | Main hook with clean comments and JSDoc on exported types | VERIFIED | 399 lines. 7 JSDoc comments on exported types/function. Zero planning refs. ESLint clean. |
+| `frontend/src/workers/whisper.worker.ts` | Worker with clean comments and fixed Prettier formatting | VERIFIED | 176 lines. 1 JSDoc on WorkerMessageData. Zero planning refs. Prettier clean. |
+| `frontend/src/pages/chat/conversation/DownloadProgressBanner.tsx` | Component with fixed ESLint/Prettier violations | VERIFIED | 69 lines. 1 JSDoc on props. Import order correct. Set-state-in-effect fix via render-phase derivation. ESLint + Prettier clean. |
+| `frontend/src/pages/chat/conversation/PrivacyBadge.tsx` | Component with fixed Prettier | VERIFIED | 17 lines. Multi-line span collapsed. Prettier clean. |
+| `frontend/src/pages/chat/conversation/LocalTranscribeButton.tsx` | Component with JSDoc on props | VERIFIED | 93 lines. 1 JSDoc on LocalTranscribeButtonProps. ESLint clean. |
+| `frontend/src/lib/audio-utils.ts` | Utility with JSDoc on exported function | VERIFIED | 23 lines. 1 JSDoc on resampleToMono16kHz. ESLint clean. |
+| `backend/src/extensions/other/local-transcribe.ts` | Backend extension passing lint | VERIFIED | 38 lines. ESLint clean. No planning refs. |
+| `.planning/PROJECT.md` | Accurate project documentation matching shipped code | VERIFIED | Contains "whisper-small" 6 times. "whisper-base" only in comparison context. "~240MB" appears 4 times. |
+| `.planning/REQUIREMENTS.md` | Accurate requirements matching shipped code | VERIFIED | Contains "whisper-small" 2 times. Zero "whisper-base" references. MODEL-01 updated. |
+
+### Key Link Verification
+
+| From | To | Via | Status | Details |
+|------|----|-----|--------|---------|
+| `useLocalTranscribe.ts` | `whisper.worker.ts` | Worker message interface | WIRED | Hook posts `{ type: 'load' }` (line 335) and `{ type: 'transcribe' }` (line 288). Worker handles both message types (lines 111, 133). |
+| `LocalTranscribeButton.tsx` | `useLocalTranscribe.ts` | Exported types imported by component | WIRED | `import { LocalTranscribeState } from 'src/hooks/useLocalTranscribe'` (line 3). Type used in props interface (line 8). |
+| `.planning/PROJECT.md` | `whisper.worker.ts` | Model name consistency | WIRED | Both reference "whisper-small": PROJECT.md (6 occurrences), code uses `'onnx-community/whisper-small'` (worker line 80). |
+
+### Data-Flow Trace (Level 4)
+
+Not applicable -- this phase modifies comments, formatting, and documentation only. No new data-rendering artifacts were created.
+
+### Behavioral Spot-Checks
+
+| Behavior | Command | Result | Status |
+|----------|---------|--------|--------|
+| Frontend tests pass | `npx vitest run` (7 test files) | 91 tests passed, 0 failed | PASS |
+| Backend tests pass | `npx jest local-transcribe.spec.ts` | 5 tests passed, 0 failed | PASS |
+| ESLint clean (frontend) | `npx eslint` (7 files) | Exit 0, zero violations | PASS |
+| ESLint clean (backend) | `npx eslint local-transcribe.ts` | Exit 0, zero violations | PASS |
+| Prettier clean | `npx prettier --check` (7 files) | All files use Prettier code style | PASS |
+| No planning refs remain | `grep -rn 'D-0[0-9]\|AUDIO-0[0-9]'` (8 files) | Zero matches | PASS |
+
+### Requirements Coverage
+
+| Requirement | Source Plan | Description | Status | Evidence |
+|-------------|-----------|-------------|--------|----------|
+| PHASE-06-SC1 | 06-01, 06-02 | All local transcription components, hooks, and utilities have clear, accurate documentation | SATISFIED | JSDoc on all exported types/functions (11 JSDoc comments total). PROJECT.md and REQUIREMENTS.md updated to match shipped model. |
+| PHASE-06-SC2 | 06-01 | Dead code, unused imports, and redundant abstractions are removed | SATISFIED | ESLint passes clean for all 8 files with project config (includes unused-import rules). No TODO/FIXME markers found. |
+| PHASE-06-SC3 | 06-01 | Code follows consistent patterns across all local transcription modules | SATISFIED | Planning reference suffixes removed (9 occurrences across 3 files). Prettier formatting consistent. Import ordering corrected. Set-state-in-effect pattern replaced with React-recommended render-phase derivation. |
+
+### Anti-Patterns Found
+
+| File | Line | Pattern | Severity | Impact |
+|------|------|---------|----------|--------|
+| (none found) | - | - | - | - |
+
+No anti-patterns detected. All 8 files are clean of TODO/FIXME/PLACEHOLDER markers, stub implementations, and dead code.
+
+### Human Verification Required
+
+No human verification items identified. All changes are verifiable programmatically (comment removal, lint compliance, JSDoc presence, documentation text updates, test pass/fail).
+
+### Gaps Summary
+
+No gaps found. All 8 must-have truths are verified with concrete evidence. All 3 ROADMAP success criteria are satisfied. All artifacts exist, are substantive, and are properly wired. All 96 tests pass without regression.
+
+---
+
+_Verified: 2026-05-08T19:43:44Z_
+_Verifier: Claude (gsd-verifier)_
diff --git a/.planning/research/ARCHITECTURE.md b/.planning/research/ARCHITECTURE.md
new file mode 100644
index 000000000..bc1c28558
--- /dev/null
+++ b/.planning/research/ARCHITECTURE.md
@@ -0,0 +1,606 @@
+# Architecture Patterns
+
+**Domain:** Local browser-based speech recognition (Whisper via Transformers.js)
+**Researched:** 2026-05-07
+
+## Recommended Architecture
+
+### Overview
+
+The architecture isolates ML inference in a dedicated Web Worker, connects it to the existing Extension system via a thin backend extension (no middleware, no server-side processing), and presents a UI consistent with the existing `TranscribeButton` pattern. Audio flows from the microphone through the Web Audio API for resampling, then into the Worker for inference. The Worker manages the complete Transformers.js pipeline lifecycle (load, cache, infer, unload).
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│ Main Thread (React) │
+│ │
+│ ChatInput.tsx │
+│ ├─ detects extension name "transcribe-local" │
+│ ├─ renders LocalTranscribeButton (with language dropdown) │
+│ └─ uses useLocalTranscribe hook │
+│ │
+│ useLocalTranscribe hook │
+│ ├─ manages MediaRecorder (capture audio) │
+│ ├─ converts Blob → Float32Array@16kHz via AudioContext │
+│ ├─ owns Worker lifecycle (lazy init, message passing) │
+│ ├─ tracks states: idle | loading-model | recording | │
+│ │ processing | error │
+│ └─ exposes: toggleRecording, modelProgress, transcript │
+│ │
+│ Audio Resampling (in main thread, before Worker handoff) │
+│ └─ OfflineAudioContext.decodeAudioData() → resample to 16kHz │
+│ → extract mono channel → Float32Array │
+│ │
+├─────────────── postMessage (transferable ArrayBuffer) ──────────────┤
+│ │
+│ Web Worker: whisper.worker.ts │
+│ ├─ handles messages: load | transcribe | unload │
+│ ├─ singleton pipeline via AutomaticSpeechRecognitionPipeline │
+│ ├─ model: onnx-community/whisper-base (or whisper-base-ONNX) │
+│ ├─ Transformers.js caches to browser Cache API automatically │
+│ └─ posts back: loading-progress | ready | result | error │
+│ │
+└─────────────────────────────────────────────────────────────────────┘
+
+┌─────────────────────────────────────────────────────────────────────┐
+│ Backend │
+│ └─ LocalTranscribeExtension (type: "other", group: "speech-to- │
+│ text", name: "transcribe-local") │
+│ ├─ No arguments (no API keys, no server config) │
+│ ├─ No middlewares (inference happens in browser) │
+│ └─ Purpose: make extension visible in admin UI so it can be │
+│ assigned to assistants; frontend detects name to show UI │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+### Component Boundaries
+
+| Component | Responsibility | Communicates With |
+|-----------|---------------|-------------------|
+| `LocalTranscribeExtension` (backend) | Registers extension in system, enables admin assignment to assistants. Zero server-side logic for transcription. | Frontend via configuration DTO (extension name visible in `configuration.extensions`) |
+| `ChatInput.tsx` (frontend) | Detects `transcribe-local` extension name, renders appropriate button component. Follows same pattern as existing `speech-to-text` / `transcribe-azure` detection. | `useLocalTranscribe` hook |
+| `LocalTranscribeButton` (frontend) | UI component: microphone button + language dropdown + progress bar during model download. Follows `SpeechRecognitionButton` layout pattern (button + dropdown). | `useLocalTranscribe` hook (receives state, emits toggle actions) |
+| `useLocalTranscribe` hook (frontend) | Orchestrates recording, audio preprocessing, Worker communication, and state management. Owns the full lifecycle. | MediaRecorder API, AudioContext API, `whisper.worker.ts` via `postMessage` |
+| `whisper.worker.ts` (frontend) | Runs Transformers.js pipeline in isolated thread. Manages model singleton, performs inference, reports progress. | Transformers.js / ONNX Runtime (internal), main thread via `postMessage` |
+
+### Data Flow
+
+**Phase 1: Model Loading (first use or cache miss)**
+
+```
+User clicks mic button
+ → useLocalTranscribe: check if Worker exists, if not create it
+ → postMessage({ type: 'load', model: 'onnx-community/whisper-base', language: 'de' })
+ → Worker: pipeline('automatic-speech-recognition', modelId, {
+ dtype: 'q8', // quantized for size/speed balance
+ progress_callback: (e) => self.postMessage({ type: 'loading-progress', ...e })
+ })
+ → Worker downloads model files (~140MB), Transformers.js caches them in Cache API
+ → Worker: self.postMessage({ type: 'ready' })
+ → useLocalTranscribe: set state to 'idle', model is loaded
+```
+
+**Phase 2: Record-then-Transcribe (normal operation)**
+
+```
+User clicks mic button (model already loaded)
+ → useLocalTranscribe: start MediaRecorder with getUserMedia({ audio: true })
+ → MediaRecorder collects chunks every 100ms (same as existing useTranscribe)
+ → 2-minute max timer running
+
+User clicks mic button again (stop)
+ → MediaRecorder.stop()
+ → Collect all Blob chunks into single Blob (audio/webm)
+ → Convert Blob to ArrayBuffer via blob.arrayBuffer()
+ → AudioContext.decodeAudioData(arrayBuffer) → AudioBuffer
+ → Resample to 16kHz mono:
+ const offlineCtx = new OfflineAudioContext(1, duration * 16000, 16000);
+ const source = offlineCtx.createBufferSource();
+ source.buffer = audioBuffer;
+ source.connect(offlineCtx.destination);
+ source.start();
+ const resampled = await offlineCtx.startRendering();
+ const float32 = resampled.getChannelData(0); // mono Float32Array
+ → postMessage(
+ { type: 'transcribe', audio: float32.buffer, language: 'de' },
+ [float32.buffer] // transfer ownership, zero-copy
+ )
+ → Worker: reconstruct Float32Array from transferred buffer
+ → Worker: pipeline(float32Audio, { language, task: 'transcribe' })
+ → Worker: self.postMessage({ type: 'result', text: transcribedText })
+ → useLocalTranscribe: call onTranscriptReceived(text) → sets input value
+```
+
+**Phase 3: Future real-time streaming (not implemented in v1)**
+
+```
+Preparation in architecture:
+ - Worker message protocol includes 'transcribe-chunk' type (reserved, not handled)
+ - Worker singleton pattern allows streaming chunks to same loaded model
+ - useLocalTranscribe state machine has extensible states
+ - AudioWorklet could replace MediaRecorder for continuous 16kHz PCM streaming
+
+Future flow:
+ → AudioWorklet captures 16kHz PCM directly (no post-processing needed)
+ → Chunks posted to Worker every N seconds
+ → Worker processes with chunk_length_s / stride_length_s for overlap
+ → Worker posts partial results back incrementally
+ → Hook accumulates partial transcripts in real time
+```
+
+## Component Specifications
+
+### Backend Extension: `LocalTranscribeExtension`
+
+```typescript
+// backend/src/extensions/other/local-transcribe.ts
+@Extension()
+export class LocalTranscribeExtension implements Extension {
+ constructor(private readonly i18n: I18nService) {}
+
+ get spec(): ExtensionSpec {
+ return {
+ name: 'transcribe-local',
+ group: 'speech-to-text', // mutually exclusive with other STT extensions
+ title: this.i18n.t('texts.extensions.transcribeLocal.title'),
+ logo: '...microphone SVG...',
+ description: this.i18n.t('texts.extensions.transcribeLocal.description'),
+ type: 'other',
+ arguments: {}, // no server-side configuration needed
+ };
+ }
+
+ getMiddlewares(): Promise {
+ return Promise.resolve([]); // no chat pipeline involvement
+ }
+}
+```
+
+**Why `group: 'speech-to-text'`:** The existing extensions use this group to enforce mutual exclusivity -- only one voice input method per assistant. The `ChatInput.tsx` filtering logic at line 179-183 picks the first matching voice extension. Adding `transcribe-local` to the same group means admin can choose exactly one of: Web Speech API, Azure Transcribe, or Local Whisper per assistant.
+
+### Web Worker: `whisper.worker.ts`
+
+```typescript
+// frontend/src/workers/whisper.worker.ts
+import { pipeline, env } from '@huggingface/transformers';
+import type { AutomaticSpeechRecognitionPipeline } from '@huggingface/transformers';
+
+// Disable local model check (browser-only, download from HF Hub)
+env.allowLocalModels = false;
+
+// Message types -- explicit union for type safety
+type IncomingMessage =
+ | { type: 'load'; model: string; quantized: boolean }
+ | { type: 'transcribe'; audio: ArrayBuffer; language: string }
+ | { type: 'unload' };
+
+type OutgoingMessage =
+ | { type: 'loading-progress'; status: string; progress?: number; file?: string }
+ | { type: 'ready' }
+ | { type: 'result'; text: string }
+ | { type: 'error'; message: string };
+
+let pipelineInstance: AutomaticSpeechRecognitionPipeline | null = null;
+let currentModelId: string | null = null;
+
+async function loadModel(modelId: string, quantized: boolean) {
+ if (pipelineInstance && currentModelId === modelId) {
+ self.postMessage({ type: 'ready' } as OutgoingMessage);
+ return;
+ }
+
+ pipelineInstance = await pipeline(
+ 'automatic-speech-recognition',
+ modelId,
+ {
+ dtype: quantized ? 'q8' : 'fp32',
+ progress_callback: (data: any) => {
+ self.postMessage({
+ type: 'loading-progress',
+ ...data,
+ } as OutgoingMessage);
+ },
+ }
+ );
+ currentModelId = modelId;
+ self.postMessage({ type: 'ready' } as OutgoingMessage);
+}
+
+async function transcribe(audioBuffer: ArrayBuffer, language: string) {
+ if (!pipelineInstance) {
+ self.postMessage({ type: 'error', message: 'Model not loaded' } as OutgoingMessage);
+ return;
+ }
+
+ const audioData = new Float32Array(audioBuffer);
+ const result = await pipelineInstance(audioData, {
+ language,
+ task: 'transcribe',
+ chunk_length_s: 30,
+ stride_length_s: 5,
+ });
+
+ const text = Array.isArray(result) ? result.map(r => r.text).join(' ') : result.text;
+ self.postMessage({ type: 'result', text } as OutgoingMessage);
+}
+
+self.addEventListener('message', async (event: MessageEvent) => {
+ const { type } = event.data;
+ try {
+ switch (type) {
+ case 'load':
+ await loadModel(event.data.model, event.data.quantized);
+ break;
+ case 'transcribe':
+ await transcribe(event.data.audio, event.data.language);
+ break;
+ case 'unload':
+ pipelineInstance = null;
+ currentModelId = null;
+ break;
+ }
+ } catch (error) {
+ self.postMessage({
+ type: 'error',
+ message: error instanceof Error ? error.message : 'Unknown error',
+ } as OutgoingMessage);
+ }
+});
+```
+
+### React Hook: `useLocalTranscribe`
+
+```typescript
+// frontend/src/hooks/useLocalTranscribe.ts
+// State machine: idle → loading-model → idle → recording → processing → idle
+// → error → idle
+
+export type LocalTranscribeState =
+ | 'idle'
+ | 'loading-model'
+ | 'recording'
+ | 'processing'
+ | 'error';
+
+interface UseLocalTranscribeProps {
+ onTranscriptReceived: (transcript: string) => void;
+ maxDurationMs?: number;
+ model?: string;
+ language?: string;
+}
+
+export function useLocalTranscribe({
+ onTranscriptReceived,
+ maxDurationMs = 2 * 60 * 1000, // 2 minutes
+ model = 'onnx-community/whisper-base',
+ language = 'de',
+}: UseLocalTranscribeProps) {
+ // Worker ref: created once, reused across recordings
+ // MediaRecorder refs: same pattern as existing useTranscribe
+ // Model loading progress: { loaded: number, total: number, file: string }
+ // State: LocalTranscribeState
+
+ // Key behaviors:
+ // 1. Worker is lazily created on first toggle
+ // 2. Model loads on first toggle, stays loaded for session
+ // 3. Recording uses same MediaRecorder pattern as useTranscribe
+ // 4. After stop: Blob → ArrayBuffer → AudioContext resample → Worker
+ // 5. Worker result → onTranscriptReceived callback
+
+ return {
+ state, // LocalTranscribeState
+ isRecording, // state === 'recording'
+ isProcessing, // state === 'processing'
+ isModelLoading, // state === 'loading-model'
+ modelProgress, // { loaded, total, percent } | null
+ isModelReady, // pipeline loaded and ready
+ toggleRecording, // () => void
+ };
+}
+```
+
+### Audio Resampling Utility
+
+```typescript
+// frontend/src/lib/audio-utils.ts
+
+/**
+ * Convert a Blob of recorded audio (webm/ogg) to a 16kHz mono Float32Array
+ * suitable for Whisper inference.
+ */
+export async function audioToFloat32At16kHz(blob: Blob): Promise {
+ const arrayBuffer = await blob.arrayBuffer();
+ const audioContext = new AudioContext();
+ const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
+
+ // Resample to 16kHz mono using OfflineAudioContext
+ const targetSampleRate = 16000;
+ const duration = audioBuffer.duration;
+ const offlineCtx = new OfflineAudioContext(
+ 1, // mono
+ Math.ceil(duration * targetSampleRate), // total samples
+ targetSampleRate
+ );
+
+ const source = offlineCtx.createBufferSource();
+ source.buffer = audioBuffer;
+ source.connect(offlineCtx.destination);
+ source.start(0);
+
+ const resampled = await offlineCtx.startRendering();
+
+ await audioContext.close();
+ return resampled.getChannelData(0); // Float32Array, mono, 16kHz
+}
+```
+
+### Vite Configuration Changes
+
+```typescript
+// vite.config.ts additions
+
+export default defineConfig({
+ // ... existing config ...
+ worker: {
+ format: 'es', // enable ES module imports in workers
+ },
+ // COOP/COEP headers for SharedArrayBuffer (enables WASM multi-threading)
+ // Only needed in dev; production requires server-side header configuration
+ plugins: [
+ react(),
+ tailwindcss(),
+ {
+ name: 'configure-response-headers',
+ configureServer(server) {
+ server.middlewares.use((_req, res, next) => {
+ res.setHeader('Cross-Origin-Embedder-Policy', 'require-corp');
+ res.setHeader('Cross-Origin-Opener-Policy', 'same-origin');
+ next();
+ });
+ },
+ },
+ ],
+});
+```
+
+**Important COOP/COEP note:** These headers enable `SharedArrayBuffer` which ONNX Runtime WASM uses for multi-threaded inference. Without them, inference falls back to single-threaded mode (slower but functional). The headers may affect other cross-origin resources (e.g., the proxy to backend at `/api-proxy`). If conflicts arise, Transformers.js still works without SharedArrayBuffer -- it just runs single-threaded. Test carefully before committing to these headers.
+
+### ChatInput.tsx Integration
+
+```typescript
+// Extend the existing voice extension detection (lines 179-183):
+
+const voiceExtensions =
+ configuration?.extensions?.filter(
+ (e) => e.name === 'speech-to-text'
+ || e.name === 'transcribe-azure'
+ || e.name === 'transcribe-local' // NEW
+ ) ?? [];
+const activeVoiceExtension = voiceExtensions[0];
+const showSpeechToText = activeVoiceExtension?.name === 'speech-to-text';
+const showTranscribe = activeVoiceExtension?.name === 'transcribe-azure';
+const showLocalTranscribe = activeVoiceExtension?.name === 'transcribe-local'; // NEW
+
+// In the JSX, add a third branch:
+{showSpeechToText ? (
+
+) : showTranscribe ? (
+
+) : showLocalTranscribe ? (
+ // NEW
+) : null}
+```
+
+## Patterns to Follow
+
+### Pattern 1: Singleton Pipeline in Worker
+
+**What:** Create the Transformers.js pipeline once, reuse for all transcriptions within a session. Store as module-level variable in the Worker.
+
+**When:** Always -- model loading is the expensive operation (~140MB download + WASM compilation). Inference is comparatively fast.
+
+**Why:** Loading whisper-base takes 5-30 seconds depending on connection and cache state. Users will transcribe multiple times per session. The singleton avoids re-initialization on every recording.
+
+**Guard:** Check if `currentModelId` matches requested model before reloading. If model changes (future: model selection), dispose old pipeline and create new one.
+
+### Pattern 2: Transferable ArrayBuffer for Audio
+
+**What:** When posting audio data from main thread to Worker, use the `transfer` parameter of `postMessage` to transfer ownership of the ArrayBuffer instead of copying it.
+
+**When:** Every transcription request.
+
+**Why:** Audio at 16kHz mono for 2 minutes = ~3.8MB of Float32 data. Structured cloning (the default) would copy this data. Transfer moves it zero-copy. The main thread no longer needs the audio data after posting.
+
+```typescript
+const float32 = await audioToFloat32At16kHz(blob);
+worker.postMessage(
+ { type: 'transcribe', audio: float32.buffer, language },
+ [float32.buffer] // transfer list
+);
+// float32.buffer is now detached (neutered) in main thread -- this is fine
+```
+
+### Pattern 3: Follow Existing Hook Return Shape
+
+**What:** The `useLocalTranscribe` hook should expose the same essential interface as `useTranscribe`: `isRecording`, `isTranscribing` (here: `isProcessing`), `toggleRecording`. Add model-specific extras (`isModelLoading`, `modelProgress`) as additional properties.
+
+**When:** Designing the hook API.
+
+**Why:** Consistency with existing codebase. The `TranscribeButton` and `LocalTranscribeButton` components should feel interchangeable to the developer. The `ChatInput.tsx` integration should read naturally alongside the existing hooks.
+
+### Pattern 4: Lazy Worker Initialization
+
+**What:** Do not create the Web Worker at component mount. Create it on first user interaction (first mic button click).
+
+**When:** Always.
+
+**Why:** Workers consume memory even when idle. Many users/assistants will not have the local transcribe extension enabled. Lazy init means zero overhead for non-users. Also avoids issues with Worker module loading during SSR or testing.
+
+```typescript
+const workerRef = useRef(null);
+
+function getOrCreateWorker() {
+ if (!workerRef.current) {
+ workerRef.current = new Worker(
+ new URL('../workers/whisper.worker.ts', import.meta.url),
+ { type: 'module' }
+ );
+ workerRef.current.addEventListener('message', handleWorkerMessage);
+ }
+ return workerRef.current;
+}
+```
+
+### Pattern 5: Progress Aggregation for Model Download
+
+**What:** Transformers.js emits per-file progress events during model loading (config.json, tokenizer.json, encoder model, decoder model, etc.). Aggregate these into a single overall progress percentage for the UI.
+
+**When:** During model download/loading phase.
+
+**Why:** Users need a single meaningful progress indicator, not per-file noise. Transformers.js v4 adds `progress_total` event type that simplifies this. For v3, track `{ file: progress }` map and compute weighted average.
+
+```typescript
+// In Worker, with Transformers.js v4:
+progress_callback: (e) => {
+ if (e.status === 'progress_total') {
+ self.postMessage({ type: 'loading-progress', percent: e.progress });
+ }
+}
+
+// Fallback for v3 (per-file):
+const fileProgress = new Map();
+progress_callback: (e) => {
+ if (e.status === 'progress') {
+ fileProgress.set(e.file, e.progress);
+ const total = [...fileProgress.values()].reduce((a, b) => a + b, 0) / fileProgress.size;
+ self.postMessage({ type: 'loading-progress', percent: total });
+ }
+}
+```
+
+## Anti-Patterns to Avoid
+
+### Anti-Pattern 1: Running Transformers.js on Main Thread
+
+**What:** Importing and running the pipeline directly in the React component or hook, without a Web Worker.
+
+**Why bad:** Whisper inference is CPU-intensive. Even whisper-base takes 2-10 seconds for a 30-second clip. Running on main thread freezes the entire UI -- no animations, no button clicks, no scroll. Users will think the app crashed.
+
+**Instead:** Always run in a Web Worker. The Worker thread has its own event loop and cannot block the main thread.
+
+### Anti-Pattern 2: Creating a New Worker Per Transcription
+
+**What:** `new Worker(...)` on every mic button press, terminating after each transcription.
+
+**Why bad:** Each Worker creation re-initializes the WASM runtime and must reload the model pipeline (even from cache, this takes seconds). Workers are designed to be long-lived.
+
+**Instead:** Create once (lazily), reuse for the session. Terminate only on component unmount or explicit unload.
+
+### Anti-Pattern 3: Sending Audio as Structured Clone
+
+**What:** `worker.postMessage({ audio: float32Array })` without transfer list.
+
+**Why bad:** Structured cloning copies the entire Float32Array. For 2 minutes of 16kHz mono audio, that is 1,920,000 floats = ~7.7MB copied. Transfer is zero-copy and instant.
+
+**Instead:** Use `worker.postMessage(msg, [float32Array.buffer])` with transfer list.
+
+### Anti-Pattern 4: Resampling in the Web Worker
+
+**What:** Sending the raw MediaRecorder Blob to the Worker and doing audio decoding there.
+
+**Why bad:** Web Workers do not have access to `AudioContext` or `OfflineAudioContext`. These are main-thread-only Web APIs. You would need to bundle a JavaScript audio decoder library (adding significant bundle size) or use a second AudioWorklet.
+
+**Instead:** Resample in the main thread using `OfflineAudioContext`, then transfer the resulting Float32Array to the Worker. This is fast (native browser implementation) and the data is ready for Whisper immediately.
+
+### Anti-Pattern 5: Bundling the Model in the App
+
+**What:** Including the ~140MB Whisper model in the Vite build output.
+
+**Why bad:** Massively inflates app bundle for all users, even those who never use local transcription. Vite build times would be terrible. Cache invalidation on every deploy.
+
+**Instead:** The model loads on-demand from the Hugging Face Hub. Transformers.js automatically caches downloaded files in the browser's Cache API. Second load is fast (local cache hit, no network).
+
+## File Structure
+
+```
+frontend/src/
+ workers/
+ whisper.worker.ts # Web Worker with Transformers.js pipeline
+ hooks/
+ useLocalTranscribe.ts # React hook orchestrating recording + worker
+ lib/
+ audio-utils.ts # audioToFloat32At16kHz resampling utility
+ pages/chat/conversation/
+ LocalTranscribeButton.tsx # UI component (mic + language dropdown + progress)
+ ChatInput.tsx # Modified: add transcribe-local detection
+
+backend/src/
+ extensions/other/
+ local-transcribe.ts # Extension registration (name, group, type)
+ localization/
+ *.json # Add transcribeLocal.title and .description
+```
+
+## Scalability Considerations
+
+| Concern | Record-then-Transcribe (v1) | Future Real-time Streaming |
+|---------|----------------------------|---------------------------|
+| Audio capture | MediaRecorder (simple, proven) | AudioWorklet (continuous PCM at 16kHz) |
+| Audio preprocessing | OfflineAudioContext resample after stop | AudioWorklet produces 16kHz PCM directly |
+| Chunk strategy | Full recording sent as one chunk | Overlapping chunks (30s window, 5s stride) |
+| Worker message frequency | 1 message per recording | Many messages (every 2-5 seconds) |
+| Model memory | ~200MB WASM heap, acceptable | Same -- model stays loaded |
+| Browser compatibility | All modern browsers | AudioWorklet: Chrome, Firefox, Safari 14.1+ |
+| Latency | Acceptable (post-recording) | Critical (user expects <2s feedback) |
+
+## Preparing for Real-time Without Over-engineering
+
+The v1 architecture prepares for real-time streaming through these specific choices, none of which add implementation cost now:
+
+1. **Worker message protocol is typed and extensible.** Adding `{ type: 'transcribe-chunk' }` later requires no protocol changes.
+
+2. **Worker singleton pattern.** The loaded model stays in memory. Streaming just sends more frequent messages to the same pipeline.
+
+3. **Audio utility is a separate module.** When switching to AudioWorklet for real-time, the `audio-utils.ts` module can be extended or a parallel path can be added without touching the Worker.
+
+4. **State machine in hook is explicit.** Adding `'streaming'` state later is a one-line type change plus handler logic, not a refactor.
+
+5. **What NOT to build now:** Do not create an AudioWorklet, do not implement chunked transcription with overlap, do not build partial-result accumulation UI. These are all real-time concerns that add complexity without value for record-then-transcribe.
+
+## Build Order (Dependency Graph)
+
+```
+Phase 1: Foundation
+ 1a. Backend extension (local-transcribe.ts) # no dependencies
+ 1b. Audio resampling utility (audio-utils.ts) # no dependencies
+ 1c. Web Worker (whisper.worker.ts) # depends on @huggingface/transformers
+
+ Can build 1a, 1b, 1c in parallel.
+
+Phase 2: Integration
+ 2a. useLocalTranscribe hook # depends on 1b (audio-utils), 1c (worker)
+ 2b. Vite config (worker format, optional COOP/COEP) # depends on nothing, but test with 1c
+
+Phase 3: UI
+ 3a. LocalTranscribeButton component # depends on 2a (hook interface)
+ 3b. ChatInput.tsx modification # depends on 3a, 2a
+ 3c. i18n texts # depends on nothing, but needed by 3a, 3b
+
+Phase 4: Polish
+ 4a. Progress bar UI for model download # depends on 2a (modelProgress from hook)
+ 4b. Error handling edge cases # depends on all above
+ 4c. COOP/COEP header investigation and production # depends on deployment infrastructure
+ server configuration
+```
+
+## Sources
+
+- [whisper-web (reference implementation)](https://github.com/xenova/whisper-web) -- HIGH confidence
+- [Transformers.js documentation](https://huggingface.co/docs/transformers.js/index) -- HIGH confidence
+- [Transformers.js v4 blog (ModelRegistry, progress_total)](https://huggingface.co/blog/transformersjs-v4) -- HIGH confidence
+- [Transformers.js v3 blog (WebGPU, ASR support)](https://huggingface.co/blog/transformersjs-v3) -- HIGH confidence
+- [onnx-community/whisper-base model card](https://huggingface.co/onnx-community/whisper-base) -- HIGH confidence
+- [Speech Recognition in the Browser with Transformers.js](https://blog.rasc.ch/2025/01/transformers-js-speech.html) -- MEDIUM confidence (community blog, verified patterns)
+- [Offline Whisper: Browser + Node.js (AssemblyAI)](https://www.assemblyai.com/blog/offline-speech-recognition-whisper-browser-node-js) -- MEDIUM confidence
+- [Vite Web Workers documentation](https://vite-workshop.netlify.app/web-workers) -- MEDIUM confidence
+- [COOP/COEP for SharedArrayBuffer (web.dev)](https://web.dev/articles/coop-coep) -- HIGH confidence
+- [whisper-web DeepWiki architecture analysis](https://deepwiki.com/xenova/whisper-web) -- MEDIUM confidence
diff --git a/.planning/research/FEATURES.md b/.planning/research/FEATURES.md
new file mode 100644
index 000000000..7f1c504ec
--- /dev/null
+++ b/.planning/research/FEATURES.md
@@ -0,0 +1,121 @@
+# Feature Landscape
+
+**Domain:** Local browser-based speech recognition (Whisper via Transformers.js)
+**Researched:** 2026-05-07
+**Context:** Brownfield integration into c4 GenAI Suite, which already has two cloud-based speech recognition options (Web Speech API via `speech-to-text`, Azure Whisper via `transcribe-azure`). The new local option must feel native alongside these existing implementations.
+
+## Table Stakes
+
+Features users expect. Missing = product feels incomplete or broken.
+
+| Feature | Why Expected | Complexity | Notes |
+|---------|--------------|------------|-------|
+| **Microphone toggle button** | Users need a single, obvious control to start/stop recording. Both existing implementations use an `ActionIcon` toggle pattern -- consistency is mandatory. | Low | Follow `TranscribeButton` pattern: single button, state-driven icon/color. The `SpeechRecognitionButton` split-button with language dropdown is the more complex pattern to replicate. |
+| **Recording state indication** | Users must know when the mic is hot. Without visual feedback, users don't know if they are being recorded. Both existing buttons use `animate-pulse` + red fill when active. | Low | Red pulsing icon on recording, disabled/loading spinner on transcribing, outline/black on idle. Matches existing `TranscribeButton` exactly. |
+| **Transcription progress indicator** | After recording stops, local Whisper inference takes several seconds (5-30s depending on audio length and device). Silence during processing feels broken. The existing Azure path shows a loading spinner via Mantine's `loading` prop. | Low | Use Mantine `ActionIcon` `loading={isTranscribing}` like the existing `TranscribeButton`. Consider a brief toast or inline status text for longer transcriptions. |
+| **Model download progress bar** | The Whisper model is ~140MB. A first-time download without progress feedback looks like the app is frozen. This is the single most important UX difference from the cloud-based options. | Medium | Transformers.js `progress_callback` provides `loaded`/`total` bytes per file. Aggregate into a single percentage. Show a Mantine `Progress` bar or modal with percentage and "Downloading speech model..." text. Cache the model in IndexedDB (Transformers.js does this automatically) so progress only appears on first use. |
+| **Language selection (de/en)** | The project explicitly requires de/en support. The existing `SpeechRecognitionButton` already has a language dropdown (de-DE, en-US). Users expect the same control for the local option. Whisper multilingual models accept a language token to guide transcription. | Low | Reuse the existing `Language` type and `SpeechRecognitionButton` split-button pattern. Map `de-DE` to `<\|de\|>` and `en-US` to `<\|en\|>` for the Whisper `language` parameter. The whisper-base multilingual model supports both. |
+| **Microphone permission handling** | Users must grant mic access. Denied permission must show a clear, actionable error. Existing `useTranscribe` already handles `NotAllowedError` with a dedicated toast message. | Low | Reuse existing error text: `texts.chat.transcribe.microphonePermissionDenied`. Show before any model loading occurs -- don't download 140MB only to fail on mic access. |
+| **Error handling with user-facing messages** | Network failures during model download, unsupported browsers, transcription failures -- all must surface actionable messages, not silent failures. Existing hooks use `toast.error()` consistently. | Low | Use `react-toastify` toast.error() for all error states. Key error scenarios: model download failed (network), browser not supported (no Web Worker/WASM), transcription returned empty, audio too short. Follow existing i18n pattern with new text keys under `texts.chat.localTranscribe`. |
+| **Max duration enforcement (2 min)** | Prevents runaway memory usage and keeps inference time manageable. The existing `useTranscribe` already implements this with `maxDurationMs` and auto-stop. | Low | Same pattern: `setInterval` checking elapsed time, auto-stop at 120,000ms, toast.info with max duration message. |
+| **Transcript insertion into chat input** | The transcribed text must appear in the textarea, ready to send. Both existing implementations call `onTranscriptReceived(result.text)` or `onTranscriptUpdate(transcript)` which calls `setInput`. | Low | Follow `useTranscribe` pattern: `onTranscriptReceived: (transcript: string) => void` callback that sets the chat input value. |
+| **Browser compatibility detection** | Not all browsers support the required APIs (Web Workers, WASM, potentially SharedArrayBuffer). Must fail gracefully with a clear message, not a cryptic error. | Low | Check for `window.Worker`, `WebAssembly` support at hook initialization. If missing, show `browserNotSupported` toast and don't render the button. SharedArrayBuffer may require COOP/COEP headers -- this is a deployment concern, not a runtime feature. |
+
+## Differentiators
+
+Features that set the local option apart from the existing cloud alternatives. Not expected but valued.
+
+| Feature | Value Proposition | Complexity | Notes |
+|---------|-------------------|------------|-------|
+| **Privacy badge/indicator** | Visually communicate that audio stays local. This is the entire reason the feature exists. A small "local" or shield icon on/near the button distinguishes it from cloud options and builds user trust. | Low | Add a subtle visual indicator (e.g., small shield icon, different icon variant, or tooltip text "Audio processed locally -- never leaves your browser"). Not a blocking feature, but reinforces the core value proposition at zero cost. |
+| **Model cached/ready indicator** | After first download, show that the model is cached and ready instantly. Removes the "will this take forever?" anxiety on subsequent uses. | Low | Track `modelReady` state. On subsequent loads, Transformers.js serves from IndexedDB cache and loads in 1-3 seconds vs the initial download. Could show a brief "Model ready" status or simply skip the progress bar when cached. |
+| **Recording timer display** | Show elapsed time during recording (e.g., "0:42 / 2:00"). ChatGPT and other modern voice UIs show a timer. Gives users confidence their recording is progressing and how much time remains. | Low | Use the existing `startTimeRef` pattern from `useTranscribe`. Render a small timer text near the button. Updates every second via the existing interval. |
+| **Audio level visualization** | A simple waveform or volume meter during recording confirms the mic is picking up audio. Helps users diagnose "is my mic working?" issues without waiting for transcription. | Medium | Use Web Audio API `AnalyserNode` to read frequency/amplitude data from the `MediaStream`. Render as a simple bar or mini waveform. Don't overengineer -- a 3-bar volume indicator is sufficient and much simpler than a full waveform. |
+| **Silence/no-speech detection** | Whisper hallucinates on silence (generates random text). Detecting empty audio before running inference saves time and prevents confusing output. | Medium | Use `AnalyserNode` to check RMS volume during recording. If average volume stays below threshold for entire recording, show "No speech detected" instead of running inference. This is a meaningful UX improvement over both existing cloud options which don't pre-check. |
+| **WebGPU acceleration (when available)** | Transformers.js v3 supports WebGPU. Where available (Chrome 113+), inference is significantly faster. Transparent upgrade without user action. | Medium | Pass `device: 'webgpu'` to the pipeline when `navigator.gpu` is available, fall back to WASM otherwise. The user never selects this -- it's automatic. Worth implementing in v1 since it's a pipeline option, not a separate code path. |
+| **Transcription confidence feedback** | Show the user when transcription quality might be poor (e.g., noisy audio, very short recording). Manages expectations. | High | Whisper returns log probabilities that could be aggregated into a confidence score. However, Transformers.js pipeline API does not expose these in a straightforward way. Defer unless the API makes it easy. |
+
+## Anti-Features
+
+Features to explicitly NOT build. These would waste effort, add complexity, or harm the product.
+
+| Anti-Feature | Why Avoid | What to Do Instead |
+|--------------|-----------|-------------------|
+| **Real-time streaming transcription in v1** | Whisper is not designed for streaming -- it processes complete audio segments. Attempting chunked real-time transcription adds massive complexity (chunk boundary handling, partial result stitching, overlapping windows) for marginal UX gain in a chat input context where the user types a message and sends it. The existing `speech-to-text` (Web Speech API) already provides real-time transcription for users who need it. | Implement record-then-transcribe. Architect the hook so a future streaming implementation can replace the transcription step without changing the recording or UI layer. |
+| **Model selection by end users** | Exposing model choices (tiny/base/small/medium/large) to end users creates confusion, support burden, and inconsistent experiences. Larger models have prohibitive download sizes for browser use (small=460MB, medium=1.5GB). | Fix whisper-base as the model. If needed later, make it admin-configurable via extension arguments, not user-selectable. |
+| **Offline-first / PWA mode** | The initial model download requires internet. Making the entire app work offline is a separate, much larger concern beyond speech recognition. IndexedDB caching already handles the "second use" case. | Cache the model via Transformers.js built-in IndexedDB caching. First use requires internet; subsequent uses work without re-downloading the model. |
+| **Audio playback before transcription** | Letting users replay their recording before transcribing adds UI complexity (player controls, waveform display) with minimal value in a chat context. Users want text, not audio review. | Transcribe immediately after recording stops. If the result is wrong, user can re-record. |
+| **Custom vocabulary / hotword boosting** | Whisper doesn't support custom vocabularies or hotword boosting in its standard pipeline. Attempting to hack this adds fragility. | Accept Whisper's output as-is. Users can edit the transcript in the textarea before sending. |
+| **Auto-send after transcription** | Automatically sending the message after transcription removes user control. Users need to review and edit before sending, especially with a model that may make errors. | Insert text into textarea. User reviews and presses Enter/send button. This matches both existing implementations. |
+| **Multi-speaker diarization** | Whisper-base doesn't support speaker diarization. In a chat context with one user speaking into their mic, it's irrelevant. | Single-speaker transcription only. |
+| **Audio file upload for transcription** | The feature is about voice input in chat, not batch transcription. Adding file upload creates scope creep and a different UX paradigm. | Microphone recording only. If file transcription is needed, it's a separate feature. |
+
+## Feature Dependencies
+
+```
+Browser Compatibility Detection ─── gates everything
+ │
+ v
+Microphone Permission Handling ──── gates recording
+ │
+ v
+Model Download + Progress Bar ───── gates transcription (can happen in parallel with recording)
+ │
+ v
+Recording (start/stop/timer) ────── gates transcription
+ │
+ v
+Transcription + Progress ────────── gates result insertion
+ │
+ v
+Transcript Insertion ────────────── end state
+
+Language Selection ──────────────── independent, feeds into transcription as parameter
+Silence Detection ───────────────── depends on Recording (uses same MediaStream)
+Audio Level Visualization ───────── depends on Recording (uses same MediaStream)
+WebGPU Detection ────────────────── independent, feeds into model loading as device option
+Privacy Badge ───────────────────── independent, purely visual
+```
+
+**Critical path:** Browser check -> Mic permission -> Model download (can be eager/lazy) -> Record -> Transcribe -> Insert text.
+
+**Key parallelism opportunity:** Model download can begin as soon as the extension is recognized (or on first button click), while mic permission is requested separately. The model should be downloading while the user records, not sequentially.
+
+## MVP Recommendation
+
+### Must build (Phase 1):
+
+1. **Microphone toggle button** with recording state (pulse/red/disabled) -- matches existing `TranscribeButton` pattern
+2. **Model download progress bar** -- the critical UX differentiator for local models
+3. **Language selection (de/en)** -- explicit requirement, reuse existing split-button pattern
+4. **Record-then-transcribe flow** with transcription spinner
+5. **Transcript insertion** into chat textarea
+6. **Error handling** for all failure modes (mic denied, download failed, browser unsupported, transcription empty)
+7. **Max duration enforcement** (2 minutes)
+8. **Browser compatibility detection**
+9. **Model caching** (automatic via Transformers.js IndexedDB -- no custom code needed, but surface "model ready" vs "needs download" state)
+
+### Build next (Phase 2 / quick wins after MVP):
+
+1. **Recording timer display** -- low effort, high polish
+2. **Privacy indicator** -- low effort, reinforces value proposition
+3. **Silence/no-speech detection** -- prevents Whisper hallucinations, medium effort
+4. **WebGPU acceleration** -- potentially significant performance gain, medium effort but mostly a config flag
+
+### Defer:
+
+- **Audio level visualization**: Nice but not critical. Medium effort for visual polish only.
+- **Transcription confidence feedback**: API limitations make this hard. Defer until Transformers.js pipeline exposes log probabilities more easily.
+- **Real-time streaming**: Architecturally prepare but do not implement. The existing Web Speech API extension already serves real-time use cases.
+
+## Sources
+
+- [Transformers.js Documentation](https://huggingface.co/docs/transformers.js/index) -- HIGH confidence
+- [Transformers.js GitHub](https://github.com/huggingface/transformers.js/) -- HIGH confidence
+- [Whisper WebGPU Demo (Xenova)](https://huggingface.co/spaces/Xenova/whisper-webgpu) -- HIGH confidence
+- [Offline Whisper in Browser (AssemblyAI)](https://www.assemblyai.com/blog/offline-speech-recognition-whisper-browser-node-js) -- MEDIUM confidence
+- [Browser-Based Whisper System (Dev.to)](https://dev.to/linmingren/building-a-browser-based-speech-to-text-system-with-whisper-ai-23e5) -- MEDIUM confidence
+- [Whisper Hallucination on Silence (GitHub Discussion)](https://github.com/openai/whisper/discussions/1606) -- MEDIUM confidence
+- [COOP/COEP for SharedArrayBuffer (web.dev)](https://web.dev/articles/coop-coep) -- HIGH confidence
+- [W3C Speech Recognition Accessibility](https://www.w3.org/WAI/perspective-videos/voice/) -- HIGH confidence
+- Existing codebase: `useTranscribe.ts`, `useSpeechRecognitionToggle.ts`, `ChatInput.tsx`, `TranscribeButton.tsx`, `SpeechRecognitionButton.tsx` -- PRIMARY source for integration patterns
diff --git a/.planning/research/PITFALLS.md b/.planning/research/PITFALLS.md
new file mode 100644
index 000000000..e7b11c6e6
--- /dev/null
+++ b/.planning/research/PITFALLS.md
@@ -0,0 +1,505 @@
+# Domain Pitfalls
+
+**Domain:** Browser-based speech recognition with Transformers.js (Whisper inference)
+**Researched:** 2026-05-07
+
+## Critical Pitfalls
+
+Mistakes that cause rewrites, broken deployments, or unusable features.
+
+### Pitfall 1: Missing COOP/COEP Headers -- Silent Performance Collapse
+
+**What goes wrong:** Without Cross-Origin-Opener-Policy and Cross-Origin-Embedder-Policy headers, `SharedArrayBuffer` is unavailable. ONNX Runtime Web silently falls back to single-threaded WASM execution. Whisper inference that should take 5-10 seconds takes 20-40 seconds. There is no error, no warning -- just a 3-4x slowdown that developers may not notice until users complain.
+
+**Why it happens:** Browsers gate `SharedArrayBuffer` behind cross-origin isolation (post-Spectre mitigation). The required headers are:
+- `Cross-Origin-Opener-Policy: same-origin`
+- `Cross-Origin-Embedder-Policy: require-corp` (or `credentialless`)
+
+These must be set on both the dev server and the production server. The current Vite config has no custom headers. The Caddyfile (production) has no custom headers either.
+
+**Consequences:**
+- Multi-threaded WASM is disabled; inference runs on a single thread
+- Performance is 2-4x slower for transformer models on multi-core hardware
+- No error is thrown -- `onnxruntime-web` silently degrades
+- Developers may ship thinking performance is "just how browser inference works"
+
+**Warning signs:**
+- `self.crossOriginIsolated` returns `false` in the console
+- `env.backends.onnx.wasm.numThreads` effectively capped at 1 regardless of setting
+- Inference times much slower than benchmarks suggest
+
+**Prevention:**
+1. **Vite dev server:** Add a plugin (not `server.headers`, which does not apply to page requests in dev mode) that sets both headers via middleware:
+ ```typescript
+ // vite.config.ts plugin
+ {
+ name: 'configure-cross-origin-isolation',
+ configureServer(server) {
+ server.middlewares.use((_req, res, next) => {
+ res.setHeader('Cross-Origin-Opener-Policy', 'same-origin');
+ res.setHeader('Cross-Origin-Embedder-Policy', 'credentialless');
+ next();
+ });
+ },
+ }
+ ```
+2. **Caddy production:** Add headers to the Caddyfile:
+ ```
+ header Cross-Origin-Opener-Policy "same-origin"
+ header Cross-Origin-Embedder-Policy "credentialless"
+ ```
+3. **Verification:** Check `self.crossOriginIsolated === true` at app startup and log a warning if false.
+4. **Use `credentialless` over `require-corp`** for COEP. The `require-corp` value breaks loading of cross-origin resources (images, fonts, CDN scripts) that do not send a `Cross-Origin-Resource-Policy` header. The `credentialless` value achieves the same cross-origin isolation without breaking third-party resources. Supported in Chrome 96+, Firefox 119+, Safari 18+.
+
+**Detection:** Add a runtime check early in the worker initialization:
+```typescript
+if (!self.crossOriginIsolated) {
+ console.warn('Cross-origin isolation not enabled. WASM multi-threading disabled. Whisper inference will be significantly slower.');
+}
+```
+
+**Phase mapping:** Must be addressed in Phase 1 (infrastructure/scaffolding) before any inference work begins. Retrofitting headers after other features depend on the current header configuration is painful.
+
+**Confidence:** HIGH -- verified via official MDN documentation, ONNX Runtime Web behavior, and multiple Vite issue threads.
+
+---
+
+### Pitfall 2: Vite Bundler Misconfiguration for ONNX Runtime
+
+**What goes wrong:** Vite tries to pre-bundle `onnxruntime-web` during dependency optimization, which either fails outright, produces corrupt bundles, or causes WASM files to be missing at runtime. Separately, Vite's default behavior does not recognize `.onnx` files as assets, causing import resolution failures.
+
+**Why it happens:** `onnxruntime-web` contains WASM binaries and dynamic imports that Vite's esbuild-based optimizer cannot process correctly. Vite's pre-bundling rewrites import paths, which breaks the runtime's internal file resolution for `.wasm` and `.mjs` helper files.
+
+**Consequences:**
+- Build errors: "Failed to resolve onnxruntime-web"
+- Runtime errors: WASM file not found (404) after deployment
+- Blank page with console errors about missing `.wasm` files
+- Intermittent failures that work in dev but break in production
+
+**Warning signs:**
+- Errors mentioning `onnxruntime-web` during `vite build`
+- 404 errors for `.wasm` files in browser network tab
+- Worker initialization fails silently
+
+**Prevention:**
+Add to `vite.config.ts`:
+```typescript
+export default defineConfig({
+ // ... existing config
+ optimizeDeps: {
+ exclude: ['onnxruntime-web'],
+ },
+ assetsInclude: ['**/*.onnx'],
+});
+```
+
+The project already has a precedent for handling WASM files -- `copy-pdfjs-wasm.mjs` copies pdfjs WASM files to `public/`. If the ONNX runtime WASM files also need to be served as static assets, follow the same pattern with a `copy-onnx-wasm.mjs` script.
+
+**Phase mapping:** Phase 1 (project scaffolding). Must be in place before the first `import { pipeline } from '@huggingface/transformers'` is written.
+
+**Confidence:** HIGH -- verified via Vite GitHub discussion #15962 and official Transformers.js documentation for Next.js (analogous configuration).
+
+---
+
+### Pitfall 3: Web Worker Construction Pattern Must Be Syntactically Exact
+
+**What goes wrong:** Vite uses static analysis to detect Web Worker construction. If the `new URL(...)` and `new Worker(...)` calls are separated, abstracted, or dynamically constructed, Vite does not bundle the worker file. The worker path resolves to a raw `.ts` file in dev (which works) but a missing or unbundled file in production (which breaks).
+
+**Why it happens:** Vite's `vite:worker-import-meta-url` plugin requires the exact syntactic pattern:
+```typescript
+new Worker(new URL('./worker.ts', import.meta.url), { type: 'module' })
+```
+The `new URL()` must be the direct first argument to `new Worker()`. Extracting the URL into a variable, using a ternary, or wrapping in a factory function breaks detection.
+
+**Consequences:**
+- Works perfectly in `vite dev`, breaks silently in `vite build`
+- Worker file is served as-is (unbundled) or results in 404 in production
+- TypeScript `.ts` worker files get served with wrong MIME type (`video/mp2t`)
+
+**Warning signs:**
+- Worker loads fine in dev, fails in production build
+- Network tab shows `.ts` file being requested instead of bundled `.js`
+- Console error: "Failed to construct Worker"
+
+**Prevention:**
+- Always use the one-liner pattern, never refactor URL construction:
+ ```typescript
+ // CORRECT
+ const worker = new Worker(
+ new URL('./whisper.worker.ts', import.meta.url),
+ { type: 'module' }
+ );
+
+ // WRONG -- Vite cannot detect this
+ const url = new URL('./whisper.worker.ts', import.meta.url);
+ const worker = new Worker(url, { type: 'module' });
+ ```
+- Test the production build (`vite build && vite preview`) early, not just dev mode.
+
+**Phase mapping:** Phase 2 (Web Worker implementation). This pattern must be understood before writing the worker integration.
+
+**Confidence:** HIGH -- verified via multiple Vite GitHub issues (#5979, #10837, #17766, #11823).
+
+---
+
+### Pitfall 4: Memory Leak from Pipeline Not Being Disposed
+
+**What goes wrong:** Transformers.js pipeline objects hold large typed arrays (the full model weights, ~140MB for whisper-base). These are not garbage-collected when a React component unmounts. The model stays in memory until the tab is closed or the worker is terminated. On repeated navigation to/from the transcription feature, memory grows unboundedly.
+
+**Why it happens:** The pipeline singleton pattern (recommended by Transformers.js docs) keeps the model loaded. This is intentional for performance (avoids re-downloading), but becomes a problem when:
+1. The worker is terminated and recreated on component unmount/remount
+2. React Strict Mode double-mounts components in development
+3. `pipeline.dispose()` is never called
+
+**Consequences:**
+- Memory usage grows with each navigation to/from the feature
+- On mobile devices (especially Android Chrome), the tab crashes ("Aw, Snap!")
+- On desktop, memory usage reaches 500MB+ after a few cycles
+- Zombie WASM sessions block reloading of Whisper on Android
+
+**Warning signs:**
+- Chrome DevTools Memory tab shows growing heap after unmount/remount cycles
+- Android Chrome crashes after using the feature 2-3 times
+- `performance.memory.usedJSHeapSize` (Chrome-only) keeps increasing
+
+**Prevention:**
+1. **Use a persistent worker** -- do NOT terminate the worker on component unmount. Create it once at app level, communicate via messages. The worker holds the singleton pipeline across the app lifecycle.
+2. **If the worker must be terminated**, call `pipeline.dispose()` inside the worker before `self.close()`:
+ ```typescript
+ // In worker
+ self.addEventListener('message', async (event) => {
+ if (event.data.type === 'dispose') {
+ const pipe = await PipelineSingleton.getInstance();
+ await pipe.dispose();
+ PipelineSingleton.instance = null;
+ self.close();
+ }
+ });
+ ```
+3. **Guard against React Strict Mode double-mount**: Use a ref to track initialization state and avoid creating duplicate workers.
+4. **Keep pipeline arguments stable**: The model ID and task strings must be constants, not dynamically constructed values that change on every render.
+
+**Phase mapping:** Phase 2 (Web Worker implementation) for initial architecture. Phase 3 (integration) for lifecycle management with React components.
+
+**Confidence:** HIGH -- verified via Transformers.js issues #715, #860, #958, and official test patterns showing `model.dispose()` in `afterAll` hooks.
+
+---
+
+### Pitfall 5: Audio Format Conversion -- Wrong Sample Rate or Channel Count
+
+**What goes wrong:** Whisper expects 16kHz mono Float32Array PCM audio. MediaRecorder produces WebM/Opus at the device's native sample rate (typically 44.1kHz or 48kHz), often in stereo. If the audio is not properly resampled and downmixed to mono, Whisper produces garbage output -- not an error, just wrong transcriptions.
+
+**Why it happens:** The conversion pipeline has multiple steps, each of which can silently produce incorrect data:
+1. MediaRecorder outputs compressed WebM/Opus blobs
+2. `AudioContext.decodeAudioData()` decodes to PCM at the AudioContext's sample rate
+3. Channel downmixing (stereo to mono) must extract channel 0 or average channels
+4. Resampling from native rate to 16kHz must use proper interpolation
+
+Developers often skip step 4 (assuming the AudioContext handles it) or get step 3 wrong (passing stereo data to Whisper).
+
+**Consequences:**
+- Whisper "works" but outputs nonsensical text
+- Difficult to debug because there is no error -- just wrong output
+- Quality varies between browsers (different default sample rates)
+
+**Warning signs:**
+- Transcription quality varies dramatically between browsers
+- Short recordings work but longer ones produce gibberish
+- German text produces English fragments or vice versa
+
+**Prevention:**
+Use `OfflineAudioContext` for reliable resampling:
+```typescript
+async function convertToWhisperFormat(audioBlob: Blob): Promise {
+ const arrayBuffer = await audioBlob.arrayBuffer();
+ const audioContext = new AudioContext();
+ const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
+
+ // Resample to 16kHz mono
+ const targetSampleRate = 16000;
+ const numSamples = Math.round(audioBuffer.duration * targetSampleRate);
+ const offlineCtx = new OfflineAudioContext(1, numSamples, targetSampleRate);
+
+ const source = offlineCtx.createBufferSource();
+ source.buffer = audioBuffer;
+ source.connect(offlineCtx.destination);
+ source.start(0);
+
+ const resampled = await offlineCtx.startRendering();
+ return resampled.getChannelData(0); // mono Float32Array at 16kHz
+}
+```
+
+Do NOT attempt manual resampling with linear interpolation -- `OfflineAudioContext` uses proper sinc resampling and handles edge cases.
+
+**Phase mapping:** Phase 2 (audio pipeline). This conversion function is foundational and must be correct before any Whisper integration testing.
+
+**Confidence:** HIGH -- verified via Whisper model documentation (16kHz requirement), Web Audio API spec, and Transformers.js server-side audio processing guide.
+
+---
+
+## Moderate Pitfalls
+
+### Pitfall 6: Model Download Without Progress Feedback Feels Broken
+
+**What goes wrong:** The whisper-base model is ~140MB. On first use, the download takes 10-60 seconds depending on connection speed. Without a progress indicator, users think the app is frozen, click repeatedly, or navigate away (canceling the download).
+
+**Why it happens:** The `pipeline()` function accepts a `progress_callback` but developers often forget to wire it up, or they wire it to `console.log` and forget to build UI. The callback fires per-file (encoder, decoder, tokenizer), not as a single unified progress bar.
+
+**Prevention:**
+1. Wire `progress_callback` to a UI progress bar from day one
+2. Aggregate progress across multiple files (encoder.onnx, decoder.onnx, etc.) into a single percentage
+3. Show estimated download size before the user initiates (~140MB)
+4. Cache status check: In Transformers.js v4, use `ModelRegistry.is_pipeline_cached()` to skip the progress UI on subsequent loads
+5. Disable the record button while the model is loading
+
+**Phase mapping:** Phase 2 (model loading UX). Should be implemented alongside the first pipeline initialization, not deferred.
+
+**Confidence:** HIGH -- verified via Transformers.js progress_callback API and v4 ModelRegistry API.
+
+---
+
+### Pitfall 7: COEP `require-corp` Breaks Existing Cross-Origin Resources
+
+**What goes wrong:** Setting `Cross-Origin-Embedder-Policy: require-corp` causes all cross-origin no-cors requests to require a `Cross-Origin-Resource-Policy: cross-origin` header on the response. External images, fonts, CDN resources, and embedded iframes that lack this header stop loading. The app partially breaks in ways unrelated to Whisper.
+
+**Why it happens:** `require-corp` is the well-known COEP value, and many tutorials recommend it. But it has a blast radius far beyond the Whisper feature -- it affects every resource the page loads.
+
+**Consequences:**
+- Mantine UI fonts from CDN may stop loading
+- External images in chat messages break (403/blocked)
+- Third-party scripts fail
+- Existing features regress while adding the new Whisper feature
+
+**Warning signs:**
+- Console errors: "blocked by Cross-Origin-Embedder-Policy"
+- Broken images/fonts after deploying header changes
+- Third-party integrations fail
+
+**Prevention:**
+- Use `Cross-Origin-Embedder-Policy: credentialless` instead of `require-corp`. It achieves the same cross-origin isolation for `SharedArrayBuffer` but does not require cross-origin resources to have CORP headers. It simply strips credentials from no-cors cross-origin requests.
+- Browser support for `credentialless`: Chrome 96+, Firefox 119+, Safari 18+. This is sufficient for a modern enterprise app.
+- Test with `require-corp` first in dev to identify any resources that would break, then switch to `credentialless` for deployment.
+
+**Phase mapping:** Phase 1 (header configuration). Must be tested against the entire existing app, not just the Whisper feature.
+
+**Confidence:** HIGH -- verified via MDN COEP documentation and Chrome Developer Blog.
+
+---
+
+### Pitfall 8: Transferable Objects Not Used for Audio Data
+
+**What goes wrong:** When posting audio data from the main thread to the Web Worker via `postMessage`, the Float32Array is copied (structured clone) rather than transferred. For 2 minutes of 16kHz mono audio, this is ~7.7MB -- a copy takes noticeable time and briefly doubles memory usage.
+
+**Why it happens:** Developers write `worker.postMessage({ audio: float32Array })` without the second argument specifying transferable objects. The structured clone algorithm copies the entire buffer.
+
+**Prevention:**
+```typescript
+// WRONG -- copies the buffer
+worker.postMessage({ type: 'transcribe', audio: audioData });
+
+// CORRECT -- transfers ownership (zero-copy)
+worker.postMessage(
+ { type: 'transcribe', audio: audioData },
+ [audioData.buffer] // Transfer the underlying ArrayBuffer
+);
+// audioData is now neutered (unusable) in the main thread
+```
+
+Note: After transfer, the original `audioData` in the main thread becomes empty/neutered. This is fine for the record-then-transcribe pattern since the main thread no longer needs the audio.
+
+**Phase mapping:** Phase 2 (Web Worker communication). Simple to get right if known, annoying to debug if missed.
+
+**Confidence:** HIGH -- verified via MDN Transferable Objects documentation.
+
+---
+
+### Pitfall 9: Transformers.js Version and Model ID Confusion
+
+**What goes wrong:** Developers use the wrong package name, version, or model ID. The npm package is `@huggingface/transformers` (v3/v4), NOT the old `@xenova/transformers` (v2, deprecated). Model IDs have shifted from `Xenova/whisper-base` to `onnx-community/whisper-base` for v4-optimized models.
+
+**Why it happens:** Many tutorials, Stack Overflow answers, and blog posts reference the v2 package (`@xenova/transformers`) and `Xenova/` model IDs. The library underwent a significant rebranding and restructuring.
+
+**Consequences:**
+- Installing `@xenova/transformers` gets v2 (deprecated, missing features)
+- Using `Xenova/whisper-base` model ID may load older, unoptimized ONNX exports
+- Version mismatches between `@huggingface/transformers` and `onnxruntime-web` cause cryptic errors
+
+**Prevention:**
+- Use `@huggingface/transformers` (v3 stable, v4 latest)
+- Use `onnx-community/whisper-base` as the model ID for current ONNX exports
+- Pin `onnxruntime-web` to the version that `@huggingface/transformers` depends on (check `package.json` peer deps) -- do NOT install a separate version
+- If using v3, be aware that `onnxruntime-web` versions above 1.19.x have reported compatibility issues
+
+**Phase mapping:** Phase 1 (dependency installation). Get this right on day one.
+
+**Confidence:** MEDIUM -- model ID ecosystem is actively evolving; verify the latest recommended model ID against Hugging Face Hub at implementation time.
+
+---
+
+### Pitfall 10: ONNX Runtime WASM Multi-Threading Bug
+
+**What goes wrong:** Even with `SharedArrayBuffer` available, setting `numThreads` greater than 1 may cause hangs or crashes due to a known bug in certain versions of `onnxruntime-web`.
+
+**Why it happens:** There is a documented bug (`microsoft/onnxruntime#14445`) where WASM multi-threading causes deadlocks or incorrect results in some onnxruntime-web versions.
+
+**Prevention:**
+- Start with `env.backends.onnx.wasm.numThreads = 1` for reliability
+- Test with higher thread counts only after verifying the specific onnxruntime-web version supports it
+- Cap threads: `Math.min(navigator.hardwareConcurrency || 4, 8)` to avoid degradation on high-core machines
+- Monitor the onnxruntime-web changelog for fixes before enabling multi-threading
+
+**Phase mapping:** Phase 3 (performance optimization). Single-threaded is fine for MVP; multi-threading is an optimization.
+
+**Confidence:** MEDIUM -- the bug status changes with onnxruntime-web releases; verify against the version bundled with your Transformers.js version.
+
+---
+
+## Minor Pitfalls
+
+### Pitfall 11: MediaRecorder MIME Type Varies By Browser
+
+**What goes wrong:** `MediaRecorder` supports different audio codecs across browsers. `audio/webm;codecs=opus` works in Chrome and Firefox but not Safari. Safari supports `audio/mp4` instead. Hardcoding `audio/webm` causes recording to fail on Safari.
+
+**Prevention:**
+```typescript
+const mimeType = MediaRecorder.isTypeSupported('audio/webm;codecs=opus')
+ ? 'audio/webm;codecs=opus'
+ : MediaRecorder.isTypeSupported('audio/mp4')
+ ? 'audio/mp4'
+ : 'audio/webm';
+```
+The existing `useTranscribe.ts` hardcodes `audio/webm`. The new local transcription hook should use the same pattern but with the fallback above.
+
+**Phase mapping:** Phase 2 (audio recording). The existing `useTranscribe.ts` can serve as a starting point, but needs the MIME type fallback.
+
+**Confidence:** HIGH -- well-documented browser API difference.
+
+---
+
+### Pitfall 12: React Strict Mode Double-Mount Creates Duplicate Workers
+
+**What goes wrong:** In React 19 development mode, `useEffect` runs twice (mount, unmount, remount). If the effect creates a Web Worker, two workers are created, both trying to load the 140MB model simultaneously. This doubles download bandwidth and memory.
+
+**Prevention:**
+- Use a ref to track whether the worker has already been initialized
+- Use `useRef` to hold the worker instance and only create it if `null`
+- The cleanup function must properly terminate the worker on unmount
+- The existing codebase already handles this pattern in `useTranscribe.ts` with `mediaRecorderRef`
+
+```typescript
+const workerRef = useRef(null);
+useEffect(() => {
+ if (workerRef.current) return; // Already initialized
+ workerRef.current = new Worker(
+ new URL('./whisper.worker.ts', import.meta.url),
+ { type: 'module' }
+ );
+ return () => {
+ workerRef.current?.terminate();
+ workerRef.current = null;
+ };
+}, []);
+```
+
+**Phase mapping:** Phase 2 (React integration). Standard React pattern but critical for ML workloads.
+
+**Confidence:** HIGH -- standard React 19 behavior.
+
+---
+
+### Pitfall 13: Model Not Cached Across Sessions on Some Browsers
+
+**What goes wrong:** Transformers.js caches model files in the browser's Cache API or IndexedDB. Some browsers have aggressive storage eviction policies. Safari in particular may evict cached data when under storage pressure, forcing a re-download of the 140MB model.
+
+**Prevention:**
+- Use `navigator.storage.persist()` to request persistent storage (reduces eviction risk)
+- Check cache status before recording starts (using ModelRegistry API in v4 or a manual cache check in v3)
+- Show download size estimate if the model needs re-downloading
+- Consider a "preload model" button in settings rather than lazy-loading on first record
+
+**Phase mapping:** Phase 3 (UX polish). Not critical for MVP but important for production.
+
+**Confidence:** MEDIUM -- storage eviction behavior varies by browser and is not fully documented.
+
+---
+
+### Pitfall 14: Whisper Language Parameter Must Be Set Correctly
+
+**What goes wrong:** Whisper-base is multilingual. Without specifying `language: 'de'` or `language: 'en'`, the model auto-detects the language, which is unreliable for short recordings and may produce mixed-language output.
+
+**Prevention:**
+Pass the language explicitly to the pipeline:
+```typescript
+const result = await transcriber(audioData, {
+ language: selectedLanguage, // 'de' or 'en'
+ task: 'transcribe',
+});
+```
+
+The project spec calls for a language dropdown (de/en) matching the existing speech recognition UI. Wire this value through to the pipeline call.
+
+**Phase mapping:** Phase 2 (pipeline configuration). Simple but easy to forget.
+
+**Confidence:** HIGH -- documented Whisper pipeline parameter.
+
+---
+
+### Pitfall 15: Mobile Browser Crashes with whisper-base
+
+**What goes wrong:** whisper-base (~140MB model weights) requires significant memory for inference. On mobile devices (especially Android Chrome), this frequently causes tab crashes ("Aw, Snap!") during transcription of longer audio.
+
+**Why it happens:** Mobile browsers have stricter memory limits than desktop browsers. The model weights + audio buffer + intermediate tensors can exceed the tab's memory budget on devices with limited RAM.
+
+**Prevention:**
+- Consider whisper-tiny (~75MB) as a fallback for mobile devices (detect via `navigator.userAgent` or screen size)
+- Set the 2-minute recording limit strictly on mobile
+- Use chunked processing (`chunk_length_s: 30, stride_length_s: 5`) for any audio longer than 30 seconds
+- Add a try/catch around the transcription call with a user-friendly "transcription failed, please try a shorter recording" message
+- Consider disabling the local transcription feature entirely on mobile for v1
+
+**Phase mapping:** Phase 3 (cross-device testing). Accept mobile limitations for MVP; optimize later.
+
+**Confidence:** HIGH -- verified via Transformers.js issues #740, #988 (Chrome crashes on Android).
+
+---
+
+## Phase-Specific Warnings
+
+| Phase Topic | Likely Pitfall | Mitigation |
+|-------------|---------------|------------|
+| Infrastructure / Scaffolding | COOP/COEP headers missing (Pitfall 1, 7) | Add headers to Vite plugin AND Caddyfile in the first PR. Use `credentialless`. Test entire app for regressions. |
+| Infrastructure / Scaffolding | Vite bundler misconfiguration (Pitfall 2) | Add `optimizeDeps.exclude` and `assetsInclude` to `vite.config.ts` before first import. |
+| Infrastructure / Scaffolding | Wrong package/model ID (Pitfall 9) | Use `@huggingface/transformers` (not `@xenova/transformers`). Use `onnx-community/whisper-base`. |
+| Web Worker + Pipeline | Worker pattern breaks in production (Pitfall 3) | Use exact `new Worker(new URL(...))` one-liner. Test `vite build` early. |
+| Web Worker + Pipeline | Memory leak from undisposed pipeline (Pitfall 4) | Use persistent worker with singleton pattern. Call `dispose()` only on app shutdown. |
+| Web Worker + Pipeline | Audio format conversion wrong (Pitfall 5) | Use `OfflineAudioContext` for resampling. Validate 16kHz mono output. |
+| Web Worker + Pipeline | No progress feedback during model load (Pitfall 6) | Wire `progress_callback` to UI from day one. |
+| Web Worker + Pipeline | Structured clone instead of transfer (Pitfall 8) | Use transferable objects in `postMessage`. |
+| Web Worker + Pipeline | Strict Mode double-mount (Pitfall 12) | Guard worker creation with ref check. |
+| Integration / Polish | Mobile crashes (Pitfall 15) | Chunked processing, shorter limits, graceful fallback. |
+| Integration / Polish | Cache eviction (Pitfall 13) | Request persistent storage, check cache before recording. |
+| Integration / Polish | WASM threading bug (Pitfall 10) | Start single-threaded, optimize later. |
+| Integration / Polish | MIME type browser differences (Pitfall 11) | Use `isTypeSupported()` fallback chain. |
+
+## Sources
+
+- [MDN: Cross-Origin-Embedder-Policy](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Cross-Origin-Embedder-Policy)
+- [MDN: Transferable Objects](https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Transferable_objects)
+- [Vite Issue #3909: COOP/COEP headers on dev server](https://github.com/vitejs/vite/issues/3909)
+- [Vite Issue #16536: COOP/COEP on HMR dev server](https://github.com/vitejs/vite/issues/16536)
+- [Vite Discussion #15962: ONNX file loading](https://github.com/vitejs/vite/discussions/15962)
+- [Vite Issue #5979: Worker code not bundled](https://github.com/vitejs/vite/issues/5979)
+- [Vite Issue #10837: Worker import.meta.url in 3rd party modules](https://github.com/vitejs/vite/issues/10837)
+- [Transformers.js Issue #860: WebGPU Whisper memory leak](https://github.com/huggingface/transformers.js/issues/860)
+- [Transformers.js Issue #958: Zombie memory on page close/reopen](https://github.com/huggingface/transformers.js/issues/958)
+- [Transformers.js Issue #988: Chrome crash with Whisper](https://github.com/huggingface/transformers.js/issues/988)
+- [Transformers.js Issue #740: Android Chrome crash](https://github.com/huggingface/transformers.js/issues/740)
+- [Transformers.js Issue #715: How to unload/destroy a pipeline](https://github.com/huggingface/transformers.js/issues/715)
+- [Transformers.js Issue #1016: onnxruntime-web version compatibility](https://github.com/huggingface/transformers.js/issues/1016)
+- [Transformers.js Issue #882: WASM multi-threading](https://github.com/huggingface/transformers.js/issues/882)
+- [Transformers.js v4 Release Notes](https://huggingface.co/blog/transformersjs-v4)
+- [Transformers.js Official Docs: Web Worker pattern](https://huggingface.co/docs/transformers.js/index) (Context7 verified)
+- [Chrome Developer Blog: COEP credentialless](https://developer.chrome.com/blog/coep-credentialless-origin-trial)
+- [web.dev: COOP and COEP](https://web.dev/articles/coop-coep)
+- [vite-plugin-cross-origin-isolation (npm)](https://www.npmjs.com/package/vite-plugin-cross-origin-isolation)
diff --git a/.planning/research/STACK.md b/.planning/research/STACK.md
new file mode 100644
index 000000000..0e78632bc
--- /dev/null
+++ b/.planning/research/STACK.md
@@ -0,0 +1,327 @@
+# Technology Stack: Local Browser-Based Speech Recognition with Transformers.js
+
+**Project:** c4 GenAI Suite -- Local Whisper Speech Recognition
+**Researched:** 2026-05-07
+**Overall Confidence:** HIGH
+
+## Recommended Stack
+
+### Core Library
+
+| Technology | Version | Purpose | Why | Confidence |
+|------------|---------|---------|-----|------------|
+| `@huggingface/transformers` | `^4.2.0` | ML inference runtime (Whisper ASR in browser) | Latest stable. v4 released March 2025, actively maintained (4.0 -> 4.2 through May 2026). New ModelRegistry API for cache management and progress tracking is directly needed for the download UX requirement. Monorepo restructure makes the package lighter (~53% smaller web bundle vs v3). WebGPU runtime rewritten in C++ for better performance. | HIGH |
+
+### ONNX Model
+
+| Model | Repository | Purpose | Why | Confidence |
+|-------|------------|---------|-----|------------|
+| Whisper Base (ONNX) | `onnx-community/whisper-base` | Pre-converted ONNX model for browser inference | Official onnx-community conversion of openai/whisper-base. ~140MB total (encoder + decoder). 36K+ monthly downloads. Used by 23+ HF Spaces. No manual ONNX conversion needed. Per-module dtype control available (keep encoder at fp32, quantize decoder to q8 for quality/size tradeoff). | HIGH |
+
+### Audio Capture
+
+| Technology | Version | Purpose | Why | Confidence |
+|------------|---------|---------|-----|------------|
+| MediaRecorder API | Browser built-in | Record audio from microphone | Already used in existing `useTranscribe` hook -- proven pattern in this codebase. Records as `audio/webm` blobs. Universally supported in modern browsers. | HIGH |
+| AudioContext / OfflineAudioContext | Browser built-in | Decode and resample audio to 16kHz Float32Array | Whisper requires 16kHz mono Float32Array input. AudioContext.decodeAudioData() decodes webm blobs. OfflineAudioContext handles resampling to exact 16000Hz. No external library needed. | HIGH |
+
+### Web Worker
+
+| Technology | Version | Purpose | Why | Confidence |
+|------------|---------|---------|-----|------------|
+| Native Web Worker (ES Module) | Browser built-in | Run Whisper inference off main thread | Mandatory -- Whisper inference takes seconds and would freeze the UI. Vite natively supports `new Worker(new URL('./worker.ts', import.meta.url), { type: 'module' })` syntax with full TypeScript and import support. No bundler plugin needed. | HIGH |
+
+### Build Tooling
+
+| Technology | Version | Purpose | Why | Confidence |
+|------------|---------|---------|-----|------------|
+| Vite (existing) | `8.0.8` | Build tool, dev server | Already in use. Natively handles Web Worker bundling with `import.meta.url` pattern. Needs COOP/COEP header configuration for optimal WASM multi-threading (see Infrastructure section). | HIGH |
+
+### Infrastructure / Headers
+
+| Technology | Configuration | Purpose | Why | Confidence |
+|------------|--------------|---------|-----|------------|
+| COOP/COEP Headers | Vite `server.headers` config | Enable SharedArrayBuffer for multi-threaded WASM | Without these headers, ONNX Runtime Web falls back to single-threaded WASM (3-4x slower). Required headers: `Cross-Origin-Opener-Policy: same-origin` and `Cross-Origin-Embedder-Policy: require-corp`. In dev: simple Vite server config. In production: web server/CDN config. | HIGH |
+
+## Detailed Implementation Notes
+
+### Transformers.js v4 Pipeline API
+
+The primary API is the `pipeline()` function. For Whisper ASR:
+
+```typescript
+// In Web Worker (worker.ts)
+import { pipeline, type AutomaticSpeechRecognitionPipeline } from "@huggingface/transformers";
+
+let transcriber: AutomaticSpeechRecognitionPipeline | null = null;
+
+async function loadModel(onProgress: (data: unknown) => void) {
+ transcriber = await pipeline(
+ "automatic-speech-recognition",
+ "onnx-community/whisper-base",
+ {
+ dtype: {
+ encoder_model: "fp32", // encoder is sensitive to quantization
+ decoder_model_merged: "q8", // decoder tolerates quantization well
+ },
+ device: "wasm", // "webgpu" for GPU acceleration where available
+ progress_callback: onProgress,
+ },
+ );
+}
+
+async function transcribe(audioData: Float32Array, language: string) {
+ if (!transcriber) throw new Error("Model not loaded");
+ const result = await transcriber(audioData, {
+ language,
+ task: "transcribe",
+ });
+ return result;
+}
+```
+
+### ModelRegistry API (v4 feature)
+
+Critical for the progress bar / cache management requirements:
+
+```typescript
+import { ModelRegistry } from "@huggingface/transformers";
+
+const modelId = "onnx-community/whisper-base";
+
+// Check if model is already cached (skip download prompt)
+const cached = await ModelRegistry.is_pipeline_cached(
+ "automatic-speech-recognition",
+ modelId,
+ { dtype: { encoder_model: "fp32", decoder_model_merged: "q8" } }
+);
+
+// Get total download size for progress UI
+const files = await ModelRegistry.get_pipeline_files(
+ "automatic-speech-recognition",
+ modelId,
+ { dtype: { encoder_model: "fp32", decoder_model_merged: "q8" } }
+);
+const metadata = await Promise.all(
+ files.map(file => ModelRegistry.get_file_metadata(modelId, file))
+);
+const totalBytes = metadata.reduce((sum, m) => sum + m.size, 0);
+
+// Enhanced progress callback with progress_total event
+const pipe = await pipeline("automatic-speech-recognition", modelId, {
+ progress_callback: (e) => {
+ if (e.status === "progress_total") {
+ // e.progress is 0-100 for end-to-end loading
+ self.postMessage({ type: "progress", progress: e.progress });
+ }
+ }
+});
+```
+
+### Web Worker Communication Pattern
+
+Follows the established pattern from official Transformers.js React tutorial:
+
+```typescript
+// worker.ts -- singleton pattern
+class WhisperPipeline {
+ static instance: Promise | null = null;
+
+ static getInstance(progressCallback?: (data: unknown) => void) {
+ this.instance ??= pipeline(
+ "automatic-speech-recognition",
+ "onnx-community/whisper-base",
+ {
+ dtype: { encoder_model: "fp32", decoder_model_merged: "q8" },
+ device: "wasm",
+ progress_callback: progressCallback,
+ },
+ );
+ return this.instance;
+ }
+}
+
+self.addEventListener("message", async (event) => {
+ const { type, data } = event.data;
+
+ switch (type) {
+ case "load":
+ await WhisperPipeline.getInstance((progress) => {
+ self.postMessage({ type: "progress", ...progress });
+ });
+ self.postMessage({ type: "ready" });
+ break;
+
+ case "transcribe":
+ const transcriber = await WhisperPipeline.getInstance();
+ const result = await transcriber(data.audio, {
+ language: data.language,
+ task: "transcribe",
+ });
+ self.postMessage({ type: "result", text: result.text });
+ break;
+ }
+});
+```
+
+```typescript
+// React hook -- useLocalTranscribe.ts
+const workerRef = useRef(null);
+
+useEffect(() => {
+ workerRef.current = new Worker(
+ new URL("../workers/whisper.worker.ts", import.meta.url),
+ { type: "module" }
+ );
+ // message handler...
+ return () => workerRef.current?.terminate();
+}, []);
+```
+
+### Audio Processing Pipeline
+
+The audio must be converted from MediaRecorder output (webm blobs) to Whisper's required format (16kHz mono Float32Array):
+
+```typescript
+async function processAudioBlob(blob: Blob): Promise {
+ const arrayBuffer = await blob.arrayBuffer();
+
+ // Decode the audio using the browser's built-in decoder
+ const audioContext = new AudioContext();
+ const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
+
+ // Resample to 16kHz using OfflineAudioContext
+ const offlineCtx = new OfflineAudioContext(
+ 1, // mono
+ Math.ceil(audioBuffer.duration * 16000), // length at 16kHz
+ 16000 // target sample rate
+ );
+
+ const source = offlineCtx.createBufferSource();
+ source.buffer = audioBuffer;
+ source.connect(offlineCtx.destination);
+ source.start(0);
+
+ const resampled = await offlineCtx.startRendering();
+ return resampled.getChannelData(0); // Float32Array at 16kHz
+}
+```
+
+### Vite Configuration Addition
+
+```typescript
+// vite.config.ts -- add to existing config
+export default defineConfig({
+ // ... existing config ...
+ server: {
+ headers: {
+ "Cross-Origin-Opener-Policy": "same-origin",
+ "Cross-Origin-Embedder-Policy": "require-corp",
+ },
+ // ... existing proxy config ...
+ },
+});
+```
+
+**Production note:** These headers must also be set on the production web server / reverse proxy / CDN. Without them, ONNX Runtime falls back to single-threaded WASM but still works -- just slower.
+
+**COEP impact:** `require-corp` may break loading of cross-origin resources (images, fonts, scripts) that don't include a `Cross-Origin-Resource-Policy` header. If this causes issues with existing functionality, use `credentialless` instead of `require-corp` (supported in Chrome 96+, Firefox 119+).
+
+## Model Selection Rationale
+
+| Model | Size (ONNX) | Quality | Inference Speed (browser) | Recommendation |
+|-------|-------------|---------|--------------------------|----------------|
+| whisper-tiny | ~75 MB | Acceptable for English, weak for German | ~2-5s for 30s audio | Too low quality for German |
+| **whisper-base** | **~140 MB** | **Good for de/en** | **~5-15s for 30s audio** | **Selected: best quality/size balance** |
+| whisper-small | ~460 MB | Very good | ~20-40s for 30s audio | Too large for browser download |
+| whisper-medium | ~1.5 GB | Excellent | Impractical in browser | Out of scope |
+
+**Decision:** `whisper-base` because it offers usable German accuracy at an acceptable download size (~140MB one-time). `whisper-tiny` has noticeably worse accuracy for non-English languages. `whisper-small` and larger are too heavy for a browser-download UX.
+
+### Per-Module Quantization
+
+Whisper's encoder is extremely sensitive to quantization -- using q4 or q8 for the encoder significantly degrades transcription quality. The decoder is more tolerant:
+
+| Configuration | Encoder | Decoder | Total Size (approx) | Quality Impact |
+|---------------|---------|---------|---------------------|----------------|
+| Full precision | fp32 | fp32 | ~140 MB | Baseline |
+| **Recommended** | **fp32** | **q8** | **~105 MB** | **Negligible** |
+| Aggressive | fp32 | q4 | ~85 MB | Minor degradation |
+| Bad idea | q8 | q8 | ~75 MB | Significant degradation |
+
+**Decision:** Use `fp32` for encoder, `q8` for decoder. Reduces download by ~25% with negligible quality loss.
+
+## Alternatives Considered
+
+| Category | Recommended | Alternative | Why Not |
+|----------|-------------|-------------|---------|
+| ML Runtime | `@huggingface/transformers` v4 | `onnxruntime-web` directly | Transformers.js wraps ONNX Runtime Web and adds the pipeline API, tokenizer, processor, progress callbacks, and model hub integration. Using ONNX Runtime directly means reimplementing all of that. |
+| ML Runtime | `@huggingface/transformers` v4 | `@xenova/transformers` (v2) | `@xenova/transformers` is the old package name (pre-v3). Unmaintained. All development moved to `@huggingface/transformers`. |
+| ML Runtime | `@huggingface/transformers` v4 | `whisper.cpp` / `whisper-wasm` | Lower-level C/C++ WASM port. Faster raw inference but no pipeline API, no progress callbacks, no model caching, no TypeScript types. Much more integration work. |
+| Model Format | ONNX via onnx-community | TensorFlow.js (TFJS) | Transformers.js uses ONNX natively. No official TFJS Whisper models. ONNX is the standard for browser ML in 2025/2026. |
+| Audio Processing | Web Audio API (AudioContext) | `wavefile` npm package | `wavefile` is needed for Node.js (no Web Audio API). In browser, AudioContext + OfflineAudioContext handle decoding and resampling natively with zero dependencies. |
+| Worker Comms | `postMessage` (native) | Comlink / workerize | Adds dependency for syntactic sugar. The message protocol for Whisper is simple (load, transcribe, progress) -- 3 message types don't justify a library. Matches existing codebase patterns. |
+| Inference Backend | WASM (default) | WebGPU | WebGPU gives better performance but has limited browser support (Chrome 113+, no Firefox stable, no Safari). WASM works everywhere. Start with WASM, add WebGPU as progressive enhancement later. |
+
+## What NOT to Use
+
+| Technology | Why Not |
+|------------|---------|
+| `@xenova/transformers` | Old package name, unmaintained since v3 migration to `@huggingface/transformers`. |
+| `@huggingface/transformers` v3.x | v4 is stable and current (4.2.0). v4 has ModelRegistry API needed for cache/progress UX. No reason to use v3. |
+| `react-speech-recognition` for this feature | That library wraps the Web Speech API (browser-native, cloud-based). The whole point of this feature is local inference. |
+| `wavefile` | Only needed in Node.js. Browser has Web Audio API built in for audio decoding and resampling. |
+| `comlink` or `workerize` | Over-engineering for 3 message types. Native postMessage is clearer and matches existing codebase patterns (no worker libraries currently used). |
+| `@built-in-ai/transformers-js` | Third-party wrapper for Vercel AI SDK integration. Not needed for direct pipeline usage. |
+| WebGPU as default backend | Too limited in browser support for a general-purpose app. Use WASM as default, WebGPU as optional enhancement with feature detection. |
+
+## Installation
+
+```bash
+# Single new dependency
+cd frontend && npm install @huggingface/transformers@^4.2.0
+```
+
+No other new dependencies required. Audio processing uses browser built-ins (MediaRecorder, AudioContext, OfflineAudioContext). Web Worker uses native browser API with Vite's built-in bundling support.
+
+## Browser Compatibility
+
+| Feature | Chrome | Firefox | Safari | Edge | Notes |
+|---------|--------|---------|--------|------|-------|
+| Web Worker (ES modules) | 80+ | 114+ | 15+ | 80+ | Firefox required `dom.workers.modules.enabled` in about:config until Firefox 114 |
+| MediaRecorder | 47+ | 25+ | 14.1+ | 79+ | Already validated by existing transcribe-azure feature |
+| AudioContext | 35+ | 25+ | 14.1+ | 12+ | Universal modern browser support |
+| OfflineAudioContext | 25+ | 25+ | 14.1+ | 12+ | Universal modern browser support |
+| WASM | 57+ | 52+ | 11+ | 16+ | Required for ONNX Runtime Web |
+| SharedArrayBuffer | 68+ | 79+ | 15.2+ | 79+ | Requires COOP/COEP headers. Without it, falls back to single-threaded (slower but functional) |
+| WebGPU (optional) | 113+ | Nightly | No | 113+ | Future enhancement, not required |
+| Cache API | 40+ | 41+ | 11.1+ | 17+ | Used by Transformers.js for model caching |
+
+**Minimum viable:** Chrome 80+ / Firefox 114+ / Safari 15.2+ / Edge 80+. This aligns with the existing app's browser requirements (React 19, Vite 8).
+
+## Caching Strategy
+
+Transformers.js uses the browser's Cache API by default to store downloaded model files. Key behaviors:
+
+1. **First load:** Downloads ~105-140MB from Hugging Face Hub. Progress callback fires per-file and total.
+2. **Subsequent loads:** Loads from Cache API. Near-instant model initialization.
+3. **Cache API persistence:** Survives page reloads and browser restarts. Cleared only by user action (clear site data) or browser storage pressure.
+4. **v4 ModelRegistry:** `is_pipeline_cached()` allows checking cache state before showing download UI.
+5. **Cache clearing:** `clear_pipeline_cache()` allows users to free storage if needed.
+
+No IndexedDB wrapper or custom caching code needed -- Transformers.js handles this internally.
+
+## Sources
+
+- [@huggingface/transformers npm (v4.2.0 latest)](https://www.npmjs.com/package/@huggingface/transformers) -- verified via `npm view`
+- [Transformers.js v4 announcement (Feb 2026)](https://huggingface.co/blog/transformersjs-v4) -- ModelRegistry API, WebGPU runtime, monorepo restructure
+- [Transformers.js v4.0.0 release notes](https://github.com/huggingface/transformers.js/releases/tag/4.0.0) -- breaking changes, new features
+- [Transformers.js official docs: React tutorial](https://huggingface.co/docs/transformers.js/tutorials/react) -- Web Worker pattern, singleton, message protocol
+- [Transformers.js official docs: dtypes/quantization](https://huggingface.co/docs/transformers.js/guides/dtypes) -- per-module dtype, encoder sensitivity
+- [Transformers.js official docs: WebGPU guide](https://github.com/huggingface/transformers.js/blob/main/packages/transformers/docs/source/guides/webgpu.md) -- ASR pipeline with WebGPU
+- [Transformers.js official docs: Node audio processing](https://github.com/huggingface/transformers.js/blob/main/packages/transformers/docs/source/guides/node-audio-processing.md) -- audio format requirements (16kHz, Float32Array)
+- [onnx-community/whisper-base on HF Hub](https://huggingface.co/onnx-community/whisper-base) -- ONNX model, 36K downloads/month
+- [whisper-web reference implementation](https://github.com/xenova/whisper-web) -- Web Worker architecture for browser Whisper
+- [Speech recognition blog post (Jan 2025)](https://blog.rasc.ch/2025/01/transformers-js-speech.html) -- Worker setup, audio capture, MediaRecorder pattern
+- [Vite COOP/COEP configuration](https://gist.github.com/mizchi/afcc5cf233c9e6943720fde4b4579a2b) -- server.headers config for SharedArrayBuffer
+- [Context7: Transformers.js documentation](https://context7.com/huggingface/transformers.js) -- pipeline API, ASR usage, worker patterns
diff --git a/.planning/research/SUMMARY.md b/.planning/research/SUMMARY.md
new file mode 100644
index 000000000..670e5a2ce
--- /dev/null
+++ b/.planning/research/SUMMARY.md
@@ -0,0 +1,163 @@
+# Project Research Summary
+
+**Project:** Lokale Spracherkennung mit Transformers.js
+**Domain:** Browser-based ML inference (speech recognition) integrated into existing enterprise chat platform
+**Researched:** 2026-05-07
+**Confidence:** HIGH
+
+## Executive Summary
+
+This project adds a third speech recognition option to the c4 GenAI Suite -- one that runs Whisper inference entirely in the browser via Transformers.js, ensuring audio data never leaves the user's device. The architecture is well-understood: a Web Worker runs the Transformers.js pipeline (whisper-base, ~140MB ONNX model), audio is captured via MediaRecorder and resampled to 16kHz mono Float32Array using OfflineAudioContext, and the result is inserted into the chat input. The backend contribution is minimal -- a single extension registration file with no middleware, no API keys, no server-side processing. The heavy lifting is entirely frontend.
+
+The recommended approach uses `@huggingface/transformers` v4.2+ with the `onnx-community/whisper-base` model, a record-then-transcribe flow (not real-time streaming), and per-module quantization (fp32 encoder, q8 decoder) to reduce download size to ~105MB with negligible quality loss. The existing extension system, hook patterns, and UI components provide strong integration templates -- the new feature follows established patterns for `TranscribeButton` and `useTranscribe`, meaning the implementation is largely "fill in the blanks" rather than novel architecture.
+
+The primary risks are infrastructure-level, not algorithmic. Cross-origin isolation headers (COOP/COEP) must be configured correctly for WASM multi-threading performance, but the `credentialless` COEP policy avoids breaking existing cross-origin resources. Vite's bundler must exclude `onnxruntime-web` from pre-bundling, and the Web Worker construction pattern must follow Vite's exact syntactic requirements to survive production builds. Memory management (pipeline disposal, singleton pattern, React Strict Mode guards) is the other critical concern. All of these pitfalls are well-documented with clear prevention strategies.
+
+## Key Findings
+
+### Recommended Stack
+
+The stack is minimal -- one new npm dependency plus browser built-ins. Transformers.js v4 provides the complete ML inference runtime including model loading, caching, progress callbacks, and the ASR pipeline. Everything else (audio capture, resampling, Web Workers) uses native browser APIs already proven in the existing codebase.
+
+**Core technologies:**
+- `@huggingface/transformers` v4.2+: ML inference runtime -- wraps ONNX Runtime Web with pipeline API, model caching, progress tracking, and TypeScript types. The v4 ModelRegistry API directly enables the required download progress UX.
+- `onnx-community/whisper-base` (ONNX model): Pre-converted Whisper model, ~140MB (or ~105MB with q8 decoder). 36K+ monthly downloads, used by 23+ HF Spaces. No manual ONNX conversion needed.
+- Web Worker (native, ES module): Mandatory for running inference off main thread. Vite natively supports `new Worker(new URL(...), { type: 'module' })` with full TypeScript and import resolution.
+- MediaRecorder + OfflineAudioContext (browser built-ins): Audio capture and 16kHz resampling. Same MediaRecorder pattern already used by the existing `useTranscribe` hook.
+
+### Expected Features
+
+**Must have (table stakes):**
+- Microphone toggle button with recording state indication (pulse/red/disabled)
+- Model download progress bar (~140MB first-time download)
+- Language selection (de/en) via dropdown
+- Record-then-transcribe flow with transcription spinner
+- Transcript insertion into chat textarea
+- Error handling for all failure modes (mic denied, download failed, browser unsupported, empty transcription)
+- Max recording duration enforcement (2 minutes)
+- Browser compatibility detection (Web Worker, WASM)
+- Microphone permission handling before model download
+
+**Should have (differentiators):**
+- Privacy badge/indicator -- reinforces core value proposition at near-zero cost
+- Recording timer display (elapsed/max) -- low effort, high polish
+- Silence/no-speech detection -- prevents Whisper hallucinations on empty audio
+- WebGPU acceleration (transparent, feature-detected) -- significant performance gain where available
+
+**Defer (v2+):**
+- Audio level visualization -- medium effort for visual polish only
+- Transcription confidence feedback -- API limitations make this hard
+- Real-time streaming transcription -- architecturally prepare but do not implement
+- Model selection by end users -- fix whisper-base; admin-configurable later if needed
+
+### Architecture Approach
+
+The architecture cleanly separates concerns: a thin backend extension (registration only, no server logic), a React hook (`useLocalTranscribe`) that orchestrates recording and Worker communication, a Web Worker (`whisper.worker.ts`) that owns the Transformers.js pipeline lifecycle, and a UI component (`LocalTranscribeButton`) that mirrors the existing TranscribeButton. Audio flows from microphone through main-thread resampling (OfflineAudioContext), then to the Worker via zero-copy transfer for inference.
+
+**Major components:**
+1. `LocalTranscribeExtension` (backend) -- registers extension with name `transcribe-local`, group `speech-to-text`, type `other`. No arguments, no middlewares.
+2. `whisper.worker.ts` (frontend) -- singleton pipeline, handles load/transcribe/unload messages, reports progress and results. Isolates all ML inference from main thread.
+3. `useLocalTranscribe` hook (frontend) -- state machine (idle/loading-model/recording/processing/error), manages MediaRecorder, audio preprocessing, Worker lifecycle.
+4. `LocalTranscribeButton` (frontend) -- UI component with mic button, language dropdown, progress bar. Follows existing SpeechRecognitionButton layout pattern.
+5. `audio-utils.ts` (frontend) -- `audioToFloat32At16kHz()` utility for Blob-to-Float32Array conversion via OfflineAudioContext.
+
+### Critical Pitfalls
+
+1. **COOP/COEP headers missing (silent 3-4x performance collapse)** -- Without cross-origin isolation headers, ONNX Runtime silently falls back to single-threaded WASM. Use `credentialless` COEP policy (not `require-corp`). Configure in both Vite dev plugin and production server. Verify with `self.crossOriginIsolated === true`.
+
+2. **Vite bundler misconfiguration for ONNX Runtime** -- Vite's pre-bundling cannot process `onnxruntime-web` WASM binaries. Add `optimizeDeps: { exclude: ['onnxruntime-web'] }` and `assetsInclude: ['**/*.onnx']` to vite.config.ts before any Transformers.js imports.
+
+3. **Web Worker construction pattern must be syntactically exact** -- Vite requires `new Worker(new URL('./worker.ts', import.meta.url), { type: 'module' })` as a single expression. Separating the URL into a variable breaks production builds while working fine in dev.
+
+4. **Memory leak from undisposed pipeline** -- Pipeline objects hold ~140MB model weights. Use a persistent singleton Worker (do not terminate on component unmount). Guard against React Strict Mode double-mount. Call `pipeline.dispose()` only on explicit unload.
+
+5. **Audio format conversion errors (silent wrong output)** -- Whisper requires 16kHz mono Float32Array. Incorrect resampling produces garbage transcriptions without errors. Use `OfflineAudioContext` for proper sinc resampling; never attempt manual interpolation.
+
+## Implications for Roadmap
+
+Based on research, suggested phase structure:
+
+### Phase 1: Infrastructure and Scaffolding
+
+**Rationale:** Three critical pitfalls (COOP/COEP headers, Vite bundler config, package/model ID) must be resolved before any feature code is written. Getting infrastructure wrong means debugging false negatives throughout all subsequent phases.
+**Delivers:** Working build pipeline with Transformers.js, correct Vite configuration, COOP/COEP headers, backend extension registration, i18n text keys, basic project scaffolding.
+**Addresses:** Browser compatibility detection, backend extension registration.
+**Avoids:** Pitfalls 1 (COOP/COEP), 2 (Vite bundler), 7 (COEP breaking existing resources), 9 (wrong package/model ID).
+
+### Phase 2: Core Pipeline (Worker + Audio + Hook)
+
+**Rationale:** This is the technical core -- the Web Worker, audio processing utility, and React hook. These three components have tight dependencies (hook depends on both Worker and audio utility) and must be built and tested together. This phase has the highest pitfall density (5 pitfalls).
+**Delivers:** Working end-to-end transcription pipeline: record audio, resample, send to Worker, run Whisper inference, return text. No UI yet -- testable via hook alone.
+**Addresses:** Record-then-transcribe flow, model download with progress callback, transcript delivery, max duration enforcement, microphone permission handling.
+**Avoids:** Pitfalls 3 (Worker pattern), 4 (memory leak), 5 (audio format), 6 (no progress feedback), 8 (structured clone vs transfer), 11 (MIME type), 12 (Strict Mode double-mount).
+
+### Phase 3: UI Integration
+
+**Rationale:** With the hook delivering a clean API (state, toggleRecording, modelProgress), the UI layer is straightforward and follows established component patterns. Separating UI from core pipeline allows the pipeline to stabilize before adding visual complexity.
+**Delivers:** Fully integrated local transcription in the chat UI, indistinguishable in interaction pattern from existing cloud options.
+**Addresses:** Microphone toggle button, recording state indication, transcription progress indicator, model download progress bar, language selection dropdown, error toasts.
+
+### Phase 4: Polish and Hardening
+
+**Rationale:** Differentiator features and edge-case hardening should come after the core flow is stable. These are low-effort, high-value additions that make the feature feel production-ready.
+**Delivers:** Privacy indicator, recording timer, silence detection, WebGPU acceleration, mobile graceful degradation.
+**Addresses:** Privacy badge, recording timer display, silence/no-speech detection, WebGPU acceleration, model cache persistence.
+**Avoids:** Pitfalls 10 (WASM threading bug), 13 (cache eviction), 14 (language parameter), 15 (mobile crashes).
+
+### Phase Ordering Rationale
+
+- Infrastructure first because three critical pitfalls (headers, bundler, package identity) block all other work. Debugging Whisper accuracy when the real problem is wrong COEP headers wastes days.
+- Core pipeline before UI because the hook API shape must stabilize before building components against it. The Worker and audio utility have the highest pitfall density and are where most debugging time will be spent.
+- UI as a separate phase because it follows established patterns (TranscribeButton, SpeechRecognitionButton) and is relatively low-risk once the hook API is solid.
+- Polish last because differentiators (privacy badge, timer, silence detection) add value but are not blocking for a functional feature.
+
+### Research Flags
+
+Phases likely needing deeper research during planning:
+- **Phase 1:** COOP/COEP header interaction with existing app resources (proxy to backend, any CDN-loaded assets). Needs hands-on testing, not just research.
+- **Phase 2:** ONNX Runtime WASM threading behavior with the specific `onnxruntime-web` version bundled in Transformers.js v4.2. Verify Pitfall 10 status at implementation time.
+
+Phases with standard patterns (skip research-phase):
+- **Phase 3:** UI integration follows established codebase patterns (TranscribeButton, SpeechRecognitionButton, ChatInput.tsx detection logic). The architecture research already provides the exact integration code.
+- **Phase 4:** All polish features are well-documented (WebGPU detection, AnalyserNode for silence, navigator.storage.persist).
+
+## Confidence Assessment
+
+| Area | Confidence | Notes |
+|------|------------|-------|
+| Stack | HIGH | Single new dependency (`@huggingface/transformers`). v4 is stable (released March 2025, updated through May 2026). All other technologies are browser built-ins or already in use. |
+| Features | HIGH | Feature set derived from existing codebase patterns and explicit project requirements. Table stakes are clear. Anti-features are well-reasoned. |
+| Architecture | HIGH | Architecture follows the reference `whisper-web` implementation pattern. Component boundaries align with existing codebase structure. Build order has clear dependency graph. |
+| Pitfalls | HIGH | 15 pitfalls identified with specific prevention strategies. Critical pitfalls verified via official documentation, GitHub issues, and community reports. Two moderate pitfalls (WASM threading, model ID evolution) rated MEDIUM as they depend on specific library versions at implementation time. |
+
+**Overall confidence:** HIGH
+
+### Gaps to Address
+
+- **COOP/COEP impact on existing app:** The `credentialless` COEP policy should be safe, but must be tested against the full existing app (backend proxy at `/api-proxy`, any CDN resources, embedded content). This can only be validated by running the app with headers enabled.
+- **ONNX Runtime WASM threading stability:** Pitfall 10 notes a known bug in some `onnxruntime-web` versions. The specific version bundled with `@huggingface/transformers` v4.2 should be checked at implementation time. Start single-threaded for reliability.
+- **Mobile browser viability:** whisper-base may crash on low-memory mobile devices. The decision to support, degrade gracefully, or disable on mobile should be made during Phase 4 based on testing, not upfront.
+- **Model ID evolution:** The Hugging Face ONNX model ecosystem is actively evolving. `onnx-community/whisper-base` is current as of May 2026 but should be verified at implementation time.
+
+## Sources
+
+### Primary (HIGH confidence)
+- [@huggingface/transformers npm v4.2.0](https://www.npmjs.com/package/@huggingface/transformers) -- API surface, version history
+- [Transformers.js official documentation](https://huggingface.co/docs/transformers.js/index) -- pipeline API, Web Worker patterns, dtypes/quantization
+- [Transformers.js v4 announcement](https://huggingface.co/blog/transformersjs-v4) -- ModelRegistry API, WebGPU runtime, progress_total
+- [onnx-community/whisper-base on HF Hub](https://huggingface.co/onnx-community/whisper-base) -- model card, download stats
+- [whisper-web reference implementation](https://github.com/xenova/whisper-web) -- Worker architecture, audio pipeline
+- [MDN: Cross-Origin-Embedder-Policy](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Cross-Origin-Embedder-Policy) -- COEP policies, credentialless
+- [MDN: Transferable Objects](https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Transferable_objects) -- zero-copy Worker communication
+- [web.dev: COOP and COEP](https://web.dev/articles/coop-coep) -- cross-origin isolation requirements
+- Existing codebase: `useTranscribe.ts`, `useSpeechRecognitionToggle.ts`, `ChatInput.tsx`, `TranscribeButton.tsx`, `SpeechRecognitionButton.tsx` -- integration patterns
+
+### Secondary (MEDIUM confidence)
+- [Speech Recognition in Browser with Transformers.js](https://blog.rasc.ch/2025/01/transformers-js-speech.html) -- Worker setup, audio capture patterns
+- [Offline Whisper: Browser + Node.js (AssemblyAI)](https://www.assemblyai.com/blog/offline-speech-recognition-whisper-browser-node-js) -- browser Whisper architecture
+- Transformers.js GitHub issues (#715, #740, #860, #882, #958, #988, #1016) -- pitfall verification
+- Vite GitHub issues (#3909, #5979, #10837, #15962, #16536) -- bundler behavior verification
+
+---
+*Research completed: 2026-05-07*
+*Ready for roadmap: yes*
diff --git a/backend/src/extensions/module.ts b/backend/src/extensions/module.ts
index 9e58a2707..292531645 100644
--- a/backend/src/extensions/module.ts
+++ b/backend/src/extensions/module.ts
@@ -20,6 +20,7 @@ import { OpenAIModelExtension } from './models/open-ai';
import { OpenAICompatibleModelExtension } from './models/open-ai-compatible';
import { AzureTranscribeExtension } from './other/azure-transcribe';
import { CustomPromptExtension } from './other/custom';
+import { LocalTranscribeExtension } from './other/local-transcribe';
import { SpeechToTextExtension } from './other/speech-to-text';
import { SummaryPromptExtension } from './other/summary';
import { AzureAISearchExtension } from './tools/azure-ai-search';
@@ -123,6 +124,7 @@ export class ExtensionLibraryModule {
GoogleGenAIModelExtension,
GPTImage1Extension,
GroundingWithBingSearchExtension,
+ LocalTranscribeExtension,
MCPToolsExtension,
MistralModelExtension,
GeminiImageExtension,
diff --git a/backend/src/extensions/other/local-transcribe.spec.ts b/backend/src/extensions/other/local-transcribe.spec.ts
new file mode 100644
index 000000000..e594ad364
--- /dev/null
+++ b/backend/src/extensions/other/local-transcribe.spec.ts
@@ -0,0 +1,52 @@
+import { User } from '../../domain/users';
+import { I18nService } from '../../localization/i18n.service';
+import { LocalTranscribeExtension } from './local-transcribe';
+
+describe('LocalTranscribeExtension', () => {
+ let extension: LocalTranscribeExtension;
+
+ const i18n = {
+ t: (val: string) => val,
+ } as unknown as I18nService;
+
+ const mockUser: User = {
+ id: 'test-user',
+ name: 'Test User',
+ email: 'test@example.com',
+ userGroupIds: [],
+ };
+
+ beforeEach(() => {
+ extension = new LocalTranscribeExtension(i18n);
+ });
+
+ describe('spec', () => {
+ it('should have correct name', () => {
+ expect(extension.spec.name).toBe('transcribe-local');
+ });
+
+ it('should have group set to speech-to-text', () => {
+ expect(extension.spec.group).toBe('speech-to-text');
+ });
+
+ it('should have type set to other', () => {
+ expect(extension.spec.type).toBe('other');
+ });
+
+ it('should have defaultLanguage as required select with de/en', () => {
+ const arg = extension.spec.arguments.defaultLanguage;
+ expect(arg).toMatchObject({
+ type: 'string',
+ required: true,
+ format: 'select',
+ examples: ['de', 'en'],
+ default: 'de',
+ });
+ });
+
+ it('should return empty middlewares', async () => {
+ const middlewares = await extension.getMiddlewares(mockUser);
+ expect(middlewares).toEqual([]);
+ });
+ });
+});
diff --git a/backend/src/extensions/other/local-transcribe.ts b/backend/src/extensions/other/local-transcribe.ts
new file mode 100644
index 000000000..659f07ffe
--- /dev/null
+++ b/backend/src/extensions/other/local-transcribe.ts
@@ -0,0 +1,38 @@
+import { ChatMiddleware } from '../../domain/chat';
+import { Extension, ExtensionConfiguration, ExtensionSpec } from '../../domain/extensions';
+import { User } from '../../domain/users';
+import { I18nService } from '../../localization/i18n.service';
+
+@Extension()
+export class LocalTranscribeExtension implements Extension {
+ constructor(private readonly i18n: I18nService) {}
+
+ get spec(): ExtensionSpec {
+ return {
+ name: 'transcribe-local',
+ group: 'speech-to-text',
+ title: this.i18n.t('texts.extensions.localTranscribe.title'),
+ logo: '',
+ description: this.i18n.t('texts.extensions.localTranscribe.description'),
+ type: 'other',
+ arguments: {
+ defaultLanguage: {
+ type: 'string',
+ title: this.i18n.t('texts.extensions.localTranscribe.defaultLanguage'),
+ required: true,
+ format: 'select',
+ examples: ['de', 'en'],
+ default: 'de',
+ },
+ },
+ };
+ }
+
+ getMiddlewares(_user: User): Promise {
+ return Promise.resolve([]);
+ }
+}
+
+export type LocalTranscribeConfiguration = ExtensionConfiguration & {
+ defaultLanguage: 'de' | 'en';
+};
diff --git a/backend/src/localization/i18n/de/texts.json b/backend/src/localization/i18n/de/texts.json
index a0e22966c..096784a48 100644
--- a/backend/src/localization/i18n/de/texts.json
+++ b/backend/src/localization/i18n/de/texts.json
@@ -214,6 +214,11 @@
"title": "Transcription: Azure OpenAI",
"description": "Audioaufnahmen mit Azure OpenAI in Text transkribieren"
},
+ "localTranscribe": {
+ "title": "Lokale Spracherkennung",
+ "description": "Audio wird lokal im Browser transkribiert - Audiodaten verlassen Ihr Geraet nicht",
+ "defaultLanguage": "Standardsprache"
+ },
"filesInConversation": {
"title": "Suche in Dateien im Chat",
"description": "Ermöglicht dem LLM, Dateien zu durchsuchen, die in einem Chat über die Büroklammer hochgeladen wurden.",
diff --git a/backend/src/localization/i18n/en/texts.json b/backend/src/localization/i18n/en/texts.json
index 3c926b7e4..8588f2193 100644
--- a/backend/src/localization/i18n/en/texts.json
+++ b/backend/src/localization/i18n/en/texts.json
@@ -214,6 +214,11 @@
"title": "Transcription: Azure OpenAI",
"description": "Transcribe audio recordings to text using Azure OpenAI"
},
+ "localTranscribe": {
+ "title": "Local Speech Recognition",
+ "description": "Transcribe audio locally in the browser - audio data never leaves your device",
+ "defaultLanguage": "Default Language"
+ },
"filesInConversation": {
"title": "Search Files in Chat",
"description": "Enables the LLM to search files uploaded via the paperclip to a conversation.",
diff --git a/frontend/package-lock.json b/frontend/package-lock.json
index 155ef5a4a..0e51a31ea 100644
--- a/frontend/package-lock.json
+++ b/frontend/package-lock.json
@@ -10,6 +10,7 @@
"license": "Apache-2.0",
"dependencies": {
"@floating-ui/react-dom": "^2.1.8",
+ "@huggingface/transformers": "^4.2.0",
"@mantine/colors-generator": "9.1.0",
"@mantine/core": "9.1.0",
"@mantine/dates": "^9.1.0",
@@ -1116,6 +1117,39 @@
"integrity": "sha512-RiB/yIh78pcIxl6lLMG0CgBXAZ2Y0eVHqMPYugu+9U0AeT6YBeiJpf7lbdJNIugFP5SIjwNRgo4DhR1Qxi26Gg==",
"license": "MIT"
},
+ "node_modules/@huggingface/jinja": {
+ "version": "0.5.8",
+ "resolved": "https://registry.npmjs.org/@huggingface/jinja/-/jinja-0.5.8.tgz",
+ "integrity": "sha512-ZdElB7DPS7QQS8ZnFc5RPPtkg+eN11z8AmIZWAyes6pSbwXqiFB/POVevvm01begdSX1ho9Gxln/F6qlQMsuaA==",
+ "license": "MIT",
+ "engines": {
+ "node": ">=18"
+ }
+ },
+ "node_modules/@huggingface/tokenizers": {
+ "version": "0.1.3",
+ "resolved": "https://registry.npmjs.org/@huggingface/tokenizers/-/tokenizers-0.1.3.tgz",
+ "integrity": "sha512-8rF/RRT10u+kn7YuUbUg0OF30K8rjTc78aHpxT+qJ1uWSqxT1MHi8+9ltwYfkFYJzT/oS+qw3JVfHtNMGAdqyA==",
+ "license": "Apache-2.0"
+ },
+ "node_modules/@huggingface/transformers": {
+ "version": "4.2.0",
+ "resolved": "https://registry.npmjs.org/@huggingface/transformers/-/transformers-4.2.0.tgz",
+ "integrity": "sha512-8BRCoBMH0XsWaEIamuR0LrJGAfftgHAfb2Vrffy0VKlSAE/MnUJ5/h/zTfEP3fDIft+nk7TqB8xXEyABGitBjQ==",
+ "license": "Apache-2.0",
+ "dependencies": {
+ "@huggingface/jinja": "^0.5.6",
+ "@huggingface/tokenizers": "^0.1.3",
+ "onnxruntime-node": "1.24.3",
+ "onnxruntime-web": "1.26.0-dev.20260416-b7804b056c",
+ "sharp": "^0.34.5"
+ }
+ },
+ "node_modules/@huggingface/transformers/node_modules/sharp": {
+ "resolved": "node_modules/@huggingface/transformers/stubs/sharp",
+ "link": true
+ },
+ "node_modules/@huggingface/transformers/stubs/sharp": {},
"node_modules/@humanfs/core": {
"version": "0.19.1",
"dev": true,
@@ -2303,6 +2337,70 @@
"url": "https://opencollective.com/pkgr"
}
},
+ "node_modules/@protobufjs/aspromise": {
+ "version": "1.1.2",
+ "resolved": "https://registry.npmjs.org/@protobufjs/aspromise/-/aspromise-1.1.2.tgz",
+ "integrity": "sha512-j+gKExEuLmKwvz3OgROXtrJ2UG2x8Ch2YZUxahh+s1F2HZ+wAceUNLkvy6zKCPVRkU++ZWQrdxsUeQXmcg4uoQ==",
+ "license": "BSD-3-Clause"
+ },
+ "node_modules/@protobufjs/base64": {
+ "version": "1.1.2",
+ "resolved": "https://registry.npmjs.org/@protobufjs/base64/-/base64-1.1.2.tgz",
+ "integrity": "sha512-AZkcAA5vnN/v4PDqKyMR5lx7hZttPDgClv83E//FMNhR2TMcLUhfRUBHCmSl0oi9zMgDDqRUJkSxO3wm85+XLg==",
+ "license": "BSD-3-Clause"
+ },
+ "node_modules/@protobufjs/codegen": {
+ "version": "2.0.5",
+ "resolved": "https://registry.npmjs.org/@protobufjs/codegen/-/codegen-2.0.5.tgz",
+ "integrity": "sha512-zgXFLzW3Ap33e6d0Wlj4MGIm6Ce8O89n/apUaGNB/jx+hw+ruWEp7EwGUshdLKVRCxZW12fp9r40E1mQrf/34g==",
+ "license": "BSD-3-Clause"
+ },
+ "node_modules/@protobufjs/eventemitter": {
+ "version": "1.1.0",
+ "resolved": "https://registry.npmjs.org/@protobufjs/eventemitter/-/eventemitter-1.1.0.tgz",
+ "integrity": "sha512-j9ednRT81vYJ9OfVuXG6ERSTdEL1xVsNgqpkxMsbIabzSo3goCjDIveeGv5d03om39ML71RdmrGNjG5SReBP/Q==",
+ "license": "BSD-3-Clause"
+ },
+ "node_modules/@protobufjs/fetch": {
+ "version": "1.1.0",
+ "resolved": "https://registry.npmjs.org/@protobufjs/fetch/-/fetch-1.1.0.tgz",
+ "integrity": "sha512-lljVXpqXebpsijW71PZaCYeIcE5on1w5DlQy5WH6GLbFryLUrBD4932W/E2BSpfRJWseIL4v/KPgBFxDOIdKpQ==",
+ "license": "BSD-3-Clause",
+ "dependencies": {
+ "@protobufjs/aspromise": "^1.1.1",
+ "@protobufjs/inquire": "^1.1.0"
+ }
+ },
+ "node_modules/@protobufjs/float": {
+ "version": "1.0.2",
+ "resolved": "https://registry.npmjs.org/@protobufjs/float/-/float-1.0.2.tgz",
+ "integrity": "sha512-Ddb+kVXlXst9d+R9PfTIxh1EdNkgoRe5tOX6t01f1lYWOvJnSPDBlG241QLzcyPdoNTsblLUdujGSE4RzrTZGQ==",
+ "license": "BSD-3-Clause"
+ },
+ "node_modules/@protobufjs/inquire": {
+ "version": "1.1.1",
+ "resolved": "https://registry.npmjs.org/@protobufjs/inquire/-/inquire-1.1.1.tgz",
+ "integrity": "sha512-mnzgDV26ueAvk7rsbt9L7bE0SuAoqyuys/sMMrmVcN5x9VsxpcG3rqAUSgDyLp0UZlmNfIbQ4fHfCtreVBk8Ew==",
+ "license": "BSD-3-Clause"
+ },
+ "node_modules/@protobufjs/path": {
+ "version": "1.1.2",
+ "resolved": "https://registry.npmjs.org/@protobufjs/path/-/path-1.1.2.tgz",
+ "integrity": "sha512-6JOcJ5Tm08dOHAbdR3GrvP+yUUfkjG5ePsHYczMFLq3ZmMkAD98cDgcT2iA1lJ9NVwFd4tH/iSSoe44YWkltEA==",
+ "license": "BSD-3-Clause"
+ },
+ "node_modules/@protobufjs/pool": {
+ "version": "1.1.0",
+ "resolved": "https://registry.npmjs.org/@protobufjs/pool/-/pool-1.1.0.tgz",
+ "integrity": "sha512-0kELaGSIDBKvcgS4zkjz1PeddatrjYcmMWOlAuAPwAeccUrPHdUqo/J6LiymHHEiJT5NrF1UVwxY14f+fy4WQw==",
+ "license": "BSD-3-Clause"
+ },
+ "node_modules/@protobufjs/utf8": {
+ "version": "1.1.1",
+ "resolved": "https://registry.npmjs.org/@protobufjs/utf8/-/utf8-1.1.1.tgz",
+ "integrity": "sha512-oOAWABowe8EAbMyWKM0tYDKi8Yaox52D+HWZhAIJqQXbqe0xI/GV7FhLWqlEKreMkfDjshR5FKgi3mnle0h6Eg==",
+ "license": "BSD-3-Clause"
+ },
"node_modules/@reduxjs/toolkit": {
"version": "2.11.2",
"resolved": "https://registry.npmjs.org/@reduxjs/toolkit/-/toolkit-2.11.2.tgz",
@@ -3634,7 +3732,6 @@
"version": "25.5.2",
"resolved": "https://registry.npmjs.org/@types/node/-/node-25.5.2.tgz",
"integrity": "sha512-tO4ZIRKNC+MDWV4qKVZe3Ql/woTnmHDr5JD8UI5hn2pwBrHEwOEMZK7WlNb5RKB6EoJ02gwmQS9OrjuFnZYdpg==",
- "devOptional": true,
"license": "MIT",
"dependencies": {
"undici-types": "~7.18.0"
@@ -4180,6 +4277,15 @@
"acorn": "^6.0.0 || ^7.0.0 || ^8.0.0"
}
},
+ "node_modules/adm-zip": {
+ "version": "0.5.17",
+ "resolved": "https://registry.npmjs.org/adm-zip/-/adm-zip-0.5.17.tgz",
+ "integrity": "sha512-+Ut8d9LLqwEvHHJl1+PIHqoyDxFgVN847JTVM3Izi3xHDWPE4UtzzXysMZQs64DMcrJfBeS/uoEP4AD3HQHnQQ==",
+ "license": "MIT",
+ "engines": {
+ "node": ">=12.0"
+ }
+ },
"node_modules/agent-base": {
"version": "7.1.3",
"dev": true,
@@ -4568,6 +4674,13 @@
"require-from-string": "^2.0.2"
}
},
+ "node_modules/boolean": {
+ "version": "3.2.0",
+ "resolved": "https://registry.npmjs.org/boolean/-/boolean-3.2.0.tgz",
+ "integrity": "sha512-d0II/GO9uf9lfUHH2BQsjxzRJZBdsjgsBiW4BvhWk/3qoKwQFjIDVN19PfX8F2D/r9PCMTtLWjYVCFrpeYUzsw==",
+ "deprecated": "Package no longer supported. Contact Support at https://www.npmjs.com/support for more info.",
+ "license": "MIT"
+ },
"node_modules/brace-expansion": {
"version": "1.1.11",
"dev": true,
@@ -5383,7 +5496,6 @@
},
"node_modules/define-data-property": {
"version": "1.1.4",
- "dev": true,
"license": "MIT",
"dependencies": {
"es-define-property": "^1.0.0",
@@ -5399,7 +5511,6 @@
},
"node_modules/define-properties": {
"version": "1.2.1",
- "dev": true,
"license": "MIT",
"dependencies": {
"define-data-property": "^1.0.1",
@@ -5444,13 +5555,20 @@
}
},
"node_modules/detect-libc": {
- "version": "2.0.4",
- "resolved": "https://registry.npmjs.org/detect-libc/-/detect-libc-2.0.4.tgz",
- "integrity": "sha512-3UDv+G9CsCKO1WKMGw9fwq/SWJYbI0c5Y7LU1AXYoDdbhE2AHQ6N6Nb34sG8Fj7T5APy8qXDCKuuIHd1BR0tVA==",
+ "version": "2.1.2",
+ "resolved": "https://registry.npmjs.org/detect-libc/-/detect-libc-2.1.2.tgz",
+ "integrity": "sha512-Btj2BOOO83o3WyH59e8MgXsxEQVcarkUOpEYrubB0urwnN10yQ364rsiByU11nZlqWYZm05i/of7io4mzihBtQ==",
+ "license": "Apache-2.0",
"engines": {
"node": ">=8"
}
},
+ "node_modules/detect-node": {
+ "version": "2.1.0",
+ "resolved": "https://registry.npmjs.org/detect-node/-/detect-node-2.1.0.tgz",
+ "integrity": "sha512-T0NIuQpnTvFDATNuHN5roPwSBG83rFsuO+MXXH9/3N1eFbn4wcPjttvjMLEPWJ0RGUYgQE7cGgS3tNxbqCGM7g==",
+ "license": "MIT"
+ },
"node_modules/detect-node-es": {
"version": "1.1.0",
"resolved": "https://registry.npmjs.org/detect-node-es/-/detect-node-es-1.1.0.tgz",
@@ -5626,7 +5744,6 @@
},
"node_modules/es-define-property": {
"version": "1.0.1",
- "dev": true,
"license": "MIT",
"engines": {
"node": ">= 0.4"
@@ -5634,7 +5751,6 @@
},
"node_modules/es-errors": {
"version": "1.3.0",
- "dev": true,
"license": "MIT",
"engines": {
"node": ">= 0.4"
@@ -5735,6 +5851,12 @@
"benchmarks"
]
},
+ "node_modules/es6-error": {
+ "version": "4.1.1",
+ "resolved": "https://registry.npmjs.org/es6-error/-/es6-error-4.1.1.tgz",
+ "integrity": "sha512-Um/+FxMr9CISWh0bi5Zv0iOD+4cFh5qLeks1qhAopKVAJw3drgKbKySikp7wGhDL0HPeaja0P5ULZrxLkniUVg==",
+ "license": "MIT"
+ },
"node_modules/es6-promise": {
"version": "3.3.1",
"dev": true,
@@ -5750,7 +5872,6 @@
},
"node_modules/escape-string-regexp": {
"version": "4.0.0",
- "dev": true,
"license": "MIT",
"engines": {
"node": ">=10"
@@ -6469,6 +6590,12 @@
"node": ">=16"
}
},
+ "node_modules/flatbuffers": {
+ "version": "25.9.23",
+ "resolved": "https://registry.npmjs.org/flatbuffers/-/flatbuffers-25.9.23.tgz",
+ "integrity": "sha512-MI1qs7Lo4Syw0EOzUl0xjs2lsoeqFku44KpngfIduHBYvzm8h2+7K8YMQh1JtVVVrUvhLpNwqVi4DERegUJhPQ==",
+ "license": "Apache-2.0"
+ },
"node_modules/flatted": {
"version": "3.4.2",
"resolved": "https://registry.npmjs.org/flatted/-/flatted-3.4.2.tgz",
@@ -6771,6 +6898,23 @@
"url": "https://github.com/sponsors/isaacs"
}
},
+ "node_modules/global-agent": {
+ "version": "3.0.0",
+ "resolved": "https://registry.npmjs.org/global-agent/-/global-agent-3.0.0.tgz",
+ "integrity": "sha512-PT6XReJ+D07JvGoxQMkT6qji/jVNfX/h364XHZOWeRzy64sSFr+xJ5OX7LI3b4MPQzdL4H8Y8M0xzPpsVMwA8Q==",
+ "license": "BSD-3-Clause",
+ "dependencies": {
+ "boolean": "^3.0.1",
+ "es6-error": "^4.1.1",
+ "matcher": "^3.0.0",
+ "roarr": "^2.15.3",
+ "semver": "^7.3.2",
+ "serialize-error": "^7.0.1"
+ },
+ "engines": {
+ "node": ">=10.0"
+ }
+ },
"node_modules/globals": {
"version": "17.4.0",
"resolved": "https://registry.npmjs.org/globals/-/globals-17.4.0.tgz",
@@ -6786,7 +6930,6 @@
},
"node_modules/globalthis": {
"version": "1.0.4",
- "dev": true,
"license": "MIT",
"dependencies": {
"define-properties": "^1.2.1",
@@ -6801,7 +6944,6 @@
},
"node_modules/gopd": {
"version": "1.2.0",
- "dev": true,
"license": "MIT",
"engines": {
"node": ">= 0.4"
@@ -6824,6 +6966,12 @@
"node": "^12.22.0 || ^14.16.0 || ^16.0.0 || >=17.0.0"
}
},
+ "node_modules/guid-typescript": {
+ "version": "1.0.9",
+ "resolved": "https://registry.npmjs.org/guid-typescript/-/guid-typescript-1.0.9.tgz",
+ "integrity": "sha512-Y8T4vYhEfwJOTbouREvG+3XDsjr8E3kIr7uf+JZ0BYloFsttiHU0WfvANVsR7TxNUJa/WpCnw/Ino/p+DeBhBQ==",
+ "license": "ISC"
+ },
"node_modules/has-bigints": {
"version": "1.1.0",
"dev": true,
@@ -6845,7 +6993,6 @@
},
"node_modules/has-property-descriptors": {
"version": "1.0.2",
- "dev": true,
"license": "MIT",
"dependencies": {
"es-define-property": "^1.0.0"
@@ -7914,6 +8061,12 @@
"dev": true,
"license": "MIT"
},
+ "node_modules/json-stringify-safe": {
+ "version": "5.0.1",
+ "resolved": "https://registry.npmjs.org/json-stringify-safe/-/json-stringify-safe-5.0.1.tgz",
+ "integrity": "sha512-ZClg6AaYvamvYEE82d3Iyd3vSSIjQ+odgjaTzRuO3s7toCdFKczob2i0zCh7JE8kWn17yvAWhUVxvqGwUalsRA==",
+ "license": "ISC"
+ },
"node_modules/json5": {
"version": "2.2.3",
"dev": true,
@@ -8384,6 +8537,12 @@
"dev": true,
"license": "MIT"
},
+ "node_modules/long": {
+ "version": "5.3.2",
+ "resolved": "https://registry.npmjs.org/long/-/long-5.3.2.tgz",
+ "integrity": "sha512-mNAgZ1GmyNhD7AuqnTG3/VQ26o760+ZYBPKjPvugO8+nLbYfX6TVpJPseBvopbdY+qpZ/lKUnmEc1LeZYS3QAA==",
+ "license": "Apache-2.0"
+ },
"node_modules/longest-streak": {
"version": "3.1.0",
"license": "MIT",
@@ -8508,6 +8667,18 @@
"url": "https://github.com/sponsors/wooorm"
}
},
+ "node_modules/matcher": {
+ "version": "3.0.0",
+ "resolved": "https://registry.npmjs.org/matcher/-/matcher-3.0.0.tgz",
+ "integrity": "sha512-OkeDaAZ/bQCxeFAozM55PKcKU0yJMPGifLwV4Qgjitu+5MoAfSQN4lsLJeXZ1b8w0x+/Emda6MZgXS1jvsapng==",
+ "license": "MIT",
+ "dependencies": {
+ "escape-string-regexp": "^4.0.0"
+ },
+ "engines": {
+ "node": ">=10"
+ }
+ },
"node_modules/math-intrinsics": {
"version": "1.1.0",
"dev": true,
@@ -9728,7 +9899,6 @@
},
"node_modules/object-keys": {
"version": "1.1.1",
- "dev": true,
"license": "MIT",
"engines": {
"node": ">= 0.4"
@@ -9825,6 +9995,49 @@
],
"license": "MIT"
},
+ "node_modules/onnxruntime-common": {
+ "version": "1.24.3",
+ "resolved": "https://registry.npmjs.org/onnxruntime-common/-/onnxruntime-common-1.24.3.tgz",
+ "integrity": "sha512-GeuPZO6U/LBJXvwdaqHbuUmoXiEdeCjWi/EG7Y1HNnDwJYuk6WUbNXpF6luSUY8yASul3cmUlLGrCCL1ZgVXqA==",
+ "license": "MIT"
+ },
+ "node_modules/onnxruntime-node": {
+ "version": "1.24.3",
+ "resolved": "https://registry.npmjs.org/onnxruntime-node/-/onnxruntime-node-1.24.3.tgz",
+ "integrity": "sha512-JH7+czbc8ALA819vlTgcV+Q214/+VjGeBHDjX81+ZCD0PCVCIFGFNtT0V4sXG/1JXypKPgScQcB3ij/hk3YnTg==",
+ "hasInstallScript": true,
+ "license": "MIT",
+ "os": [
+ "win32",
+ "darwin",
+ "linux"
+ ],
+ "dependencies": {
+ "adm-zip": "^0.5.16",
+ "global-agent": "^3.0.0",
+ "onnxruntime-common": "1.24.3"
+ }
+ },
+ "node_modules/onnxruntime-web": {
+ "version": "1.26.0-dev.20260416-b7804b056c",
+ "resolved": "https://registry.npmjs.org/onnxruntime-web/-/onnxruntime-web-1.26.0-dev.20260416-b7804b056c.tgz",
+ "integrity": "sha512-MD6Ss4GSpQBo6zqoJzyT9LRbKYs7x/JVN23FT24EcEvlqF4VuzPOeH6X38orZPKHQDbprn7K+SBpu0/mj2CQiw==",
+ "license": "MIT",
+ "dependencies": {
+ "flatbuffers": "^25.1.24",
+ "guid-typescript": "^1.0.9",
+ "long": "^5.2.3",
+ "onnxruntime-common": "1.24.0-dev.20251116-b39e144322",
+ "platform": "^1.3.6",
+ "protobufjs": "^7.2.4"
+ }
+ },
+ "node_modules/onnxruntime-web/node_modules/onnxruntime-common": {
+ "version": "1.24.0-dev.20251116-b39e144322",
+ "resolved": "https://registry.npmjs.org/onnxruntime-common/-/onnxruntime-common-1.24.0-dev.20251116-b39e144322.tgz",
+ "integrity": "sha512-BOoomdHYmNRL5r4iQ4bMvsl2t0/hzVQ3OM3PHD0gxeXu1PmggqBv3puZicEUVOA3AtHHYmqZtjMj9FOfGrATTw==",
+ "license": "MIT"
+ },
"node_modules/openapi-types": {
"version": "12.1.3",
"dev": true,
@@ -10117,6 +10330,12 @@
"url": "https://github.com/sponsors/jonschlinkert"
}
},
+ "node_modules/platform": {
+ "version": "1.3.6",
+ "resolved": "https://registry.npmjs.org/platform/-/platform-1.3.6.tgz",
+ "integrity": "sha512-fnWVljUchTro6RiCFvCXBbNhJc2NijN7oIQxbwsyL0buWJPG85v81ehlHI9fXrJsMNgTofEoWIQeClKpgxFLrg==",
+ "license": "MIT"
+ },
"node_modules/possible-typed-array-names": {
"version": "1.1.0",
"dev": true,
@@ -10457,6 +10676,30 @@
"url": "https://github.com/sponsors/wooorm"
}
},
+ "node_modules/protobufjs": {
+ "version": "7.5.6",
+ "resolved": "https://registry.npmjs.org/protobufjs/-/protobufjs-7.5.6.tgz",
+ "integrity": "sha512-M71sTMB146U3u0di3yup8iM+zv8yPRNQVr1KK4tyBitl3qFvEGucq/rGDRShD2rsJhtN02RJaJ7j5X5hmy8SJg==",
+ "hasInstallScript": true,
+ "license": "BSD-3-Clause",
+ "dependencies": {
+ "@protobufjs/aspromise": "^1.1.2",
+ "@protobufjs/base64": "^1.1.2",
+ "@protobufjs/codegen": "^2.0.5",
+ "@protobufjs/eventemitter": "^1.1.0",
+ "@protobufjs/fetch": "^1.1.0",
+ "@protobufjs/float": "^1.0.2",
+ "@protobufjs/inquire": "^1.1.1",
+ "@protobufjs/path": "^1.1.2",
+ "@protobufjs/pool": "^1.1.0",
+ "@protobufjs/utf8": "^1.1.1",
+ "@types/node": ">=13.7.0",
+ "long": "^5.0.0"
+ },
+ "engines": {
+ "node": ">=12.0.0"
+ }
+ },
"node_modules/proxy-agent": {
"version": "6.5.0",
"dev": true,
@@ -11181,6 +11424,23 @@
"url": "https://github.com/sponsors/isaacs"
}
},
+ "node_modules/roarr": {
+ "version": "2.15.4",
+ "resolved": "https://registry.npmjs.org/roarr/-/roarr-2.15.4.tgz",
+ "integrity": "sha512-CHhPh+UNHD2GTXNYhPWLnU8ONHdI+5DI+4EYIAOaiD63rHeYlZvyh8P+in5999TTSFgUYuKUAjzRI4mdh/p+2A==",
+ "license": "BSD-3-Clause",
+ "dependencies": {
+ "boolean": "^3.0.1",
+ "detect-node": "^2.0.4",
+ "globalthis": "^1.0.1",
+ "json-stringify-safe": "^5.0.1",
+ "semver-compare": "^1.0.0",
+ "sprintf-js": "^1.1.2"
+ },
+ "engines": {
+ "node": ">=8.0"
+ }
+ },
"node_modules/rolldown": {
"version": "1.0.0-rc.15",
"resolved": "https://registry.npmjs.org/rolldown/-/rolldown-1.0.0-rc.15.tgz",
@@ -11329,7 +11589,6 @@
"version": "7.7.3",
"resolved": "https://registry.npmjs.org/semver/-/semver-7.7.3.tgz",
"integrity": "sha512-SdsKMrI9TdgjdweUSR9MweHA4EJ8YxHn8DFaDisvhVlUOe4BF1tLD7GAj0lIqWVl+dPb/rExr0Btby5loQm20Q==",
- "dev": true,
"license": "ISC",
"bin": {
"semver": "bin/semver.js"
@@ -11338,6 +11597,39 @@
"node": ">=10"
}
},
+ "node_modules/semver-compare": {
+ "version": "1.0.0",
+ "resolved": "https://registry.npmjs.org/semver-compare/-/semver-compare-1.0.0.tgz",
+ "integrity": "sha512-YM3/ITh2MJ5MtzaM429anh+x2jiLVjqILF4m4oyQB18W7Ggea7BfqdH/wGMK7dDiMghv/6WG7znWMwUDzJiXow==",
+ "license": "MIT"
+ },
+ "node_modules/serialize-error": {
+ "version": "7.0.1",
+ "resolved": "https://registry.npmjs.org/serialize-error/-/serialize-error-7.0.1.tgz",
+ "integrity": "sha512-8I8TjW5KMOKsZQTvoxjuSIa7foAwPWGOts+6o7sgjz41/qMD9VQHEDxi6PBvK2l0MXUmqZyNpUK+T2tQaaElvw==",
+ "license": "MIT",
+ "dependencies": {
+ "type-fest": "^0.13.1"
+ },
+ "engines": {
+ "node": ">=10"
+ },
+ "funding": {
+ "url": "https://github.com/sponsors/sindresorhus"
+ }
+ },
+ "node_modules/serialize-error/node_modules/type-fest": {
+ "version": "0.13.1",
+ "resolved": "https://registry.npmjs.org/type-fest/-/type-fest-0.13.1.tgz",
+ "integrity": "sha512-34R7HTnG0XIJcBSn5XhDd7nNFPRcXYRZrBB2O2jdKqYODldSzBAqzsWoZYYvduky73toYS/ESqxPvkDf/F0XMg==",
+ "license": "(MIT OR CC0-1.0)",
+ "engines": {
+ "node": ">=10"
+ },
+ "funding": {
+ "url": "https://github.com/sponsors/sindresorhus"
+ }
+ },
"node_modules/set-cookie-parser": {
"version": "2.7.2",
"resolved": "https://registry.npmjs.org/set-cookie-parser/-/set-cookie-parser-2.7.2.tgz",
@@ -11635,7 +11927,6 @@
},
"node_modules/sprintf-js": {
"version": "1.1.3",
- "dev": true,
"license": "BSD-3-Clause"
},
"node_modules/stackback": {
@@ -12390,7 +12681,6 @@
"version": "7.18.2",
"resolved": "https://registry.npmjs.org/undici-types/-/undici-types-7.18.2.tgz",
"integrity": "sha512-AsuCzffGHJybSaRrmr5eHr81mwJU3kjw6M+uprWvCXiNeN9SOGwQ3Jn8jb8m3Z6izVgknn1R0FTCEAP2QrLY/w==",
- "devOptional": true,
"license": "MIT"
},
"node_modules/unified": {
diff --git a/frontend/package.json b/frontend/package.json
index ee0a7afbf..8a1015312 100644
--- a/frontend/package.json
+++ b/frontend/package.json
@@ -23,6 +23,7 @@
},
"dependencies": {
"@floating-ui/react-dom": "^2.1.8",
+ "@huggingface/transformers": "^4.2.0",
"@mantine/colors-generator": "9.1.0",
"@mantine/core": "9.1.0",
"@mantine/dates": "^9.1.0",
@@ -66,6 +67,9 @@
"zod": "^4.3.6",
"zustand": "^5.0.12"
},
+ "overrides": {
+ "sharp": "file:./stubs/sharp"
+ },
"devDependencies": {
"@eslint/compat": "2.0.2",
"@eslint/eslintrc": "3.3.3",
diff --git a/frontend/src/hooks/useLocalTranscribe.ts b/frontend/src/hooks/useLocalTranscribe.ts
new file mode 100644
index 000000000..b06f08587
--- /dev/null
+++ b/frontend/src/hooks/useLocalTranscribe.ts
@@ -0,0 +1,414 @@
+import { useCallback, useEffect, useRef, useState } from 'react';
+import { toast } from 'react-toastify';
+import { resampleToMono16kHz } from 'src/lib/audio-utils';
+import { texts } from 'src/texts';
+
+/** Represents the current state of the local transcription lifecycle. */
+export type LocalTranscribeState = 'idle' | 'downloading' | 'loading' | 'recording' | 'transcribing' | 'error';
+
+/** Tracks bytes loaded and total for the Whisper model download. */
+export interface DownloadProgress {
+ loaded: number;
+ total: number;
+ percentage: number;
+}
+
+/** Configuration for the useLocalTranscribe hook. */
+interface UseLocalTranscribeProps {
+ /** BCP 47 language code ('de' or 'en') passed to the Whisper worker. */
+ language: string;
+ /** Called with the transcribed text after successful transcription. */
+ onTranscriptReceived: (transcript: string) => void;
+ /** Maximum recording duration in milliseconds. Defaults to 2 minutes. */
+ maxDurationMs?: number;
+}
+
+/**
+ * Hook that manages browser-based Whisper speech recognition.
+ * Handles model download, audio recording, and Worker-based transcription.
+ */
+export function useLocalTranscribe({ language, onTranscriptReceived, maxDurationMs = 2 * 60 * 1000 }: UseLocalTranscribeProps) {
+ const [state, setState] = useState('idle');
+ const [downloadProgress, setDownloadProgress] = useState(null);
+ const [elapsedSeconds, setElapsedSeconds] = useState(0);
+ const [isSupported] = useState(() => {
+ return (
+ typeof Worker !== 'undefined' &&
+ typeof WebAssembly !== 'undefined' &&
+ typeof navigator.mediaDevices?.getUserMedia === 'function' &&
+ self.crossOriginIsolated === true
+ );
+ });
+
+ const workerRef = useRef(null);
+ const modelLoadedRef = useRef(false);
+ const pendingRecordRef = useRef(false);
+ const mediaRecorderRef = useRef(null);
+ const audioChunksRef = useRef([]);
+ const streamRef = useRef(null);
+ const timerRef = useRef(null);
+ const startTimeRef = useRef(0);
+ const onTranscriptReceivedRef = useRef(onTranscriptReceived);
+ const languageRef = useRef(language);
+ const stateRef = useRef(state);
+ const maxDurationMsRef = useRef(maxDurationMs);
+
+ // Keep refs in sync
+ useEffect(() => {
+ onTranscriptReceivedRef.current = onTranscriptReceived;
+ }, [onTranscriptReceived]);
+
+ useEffect(() => {
+ languageRef.current = language;
+ }, [language]);
+
+ useEffect(() => {
+ stateRef.current = state;
+ }, [state]);
+
+ useEffect(() => {
+ maxDurationMsRef.current = maxDurationMs;
+ }, [maxDurationMs]);
+
+ // Cleanup function for stream, timer, and audio chunks
+ const cleanup = useCallback(() => {
+ if (streamRef.current) {
+ streamRef.current.getTracks().forEach((track) => track.stop());
+ streamRef.current = null;
+ }
+ if (timerRef.current) {
+ clearInterval(timerRef.current);
+ timerRef.current = null;
+ }
+ audioChunksRef.current = [];
+ setElapsedSeconds(0);
+ }, []);
+
+ // Internal function to actually begin recording (after model is confirmed loaded)
+ // Uses refs exclusively so it has stable identity
+ const beginRecording = useCallback(async () => {
+ try {
+ const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
+ streamRef.current = stream;
+
+ const mediaRecorder = new MediaRecorder(stream, { mimeType: 'audio/webm' });
+ mediaRecorderRef.current = mediaRecorder;
+
+ audioChunksRef.current = [];
+
+ mediaRecorder.ondataavailable = (event: BlobEvent) => {
+ if (event.data.size > 0) {
+ audioChunksRef.current.push(event.data);
+ }
+ };
+
+ mediaRecorder.onerror = () => {
+ toast.error(texts.chat.localTranscribe.recordingStartFailed);
+ cleanup();
+ setState('error');
+ };
+
+ mediaRecorder.start(100);
+ setElapsedSeconds(0);
+ setState('recording');
+ startTimeRef.current = Date.now();
+
+ // Start duration timer for auto-stop
+ timerRef.current = window.setInterval(() => {
+ const elapsed = Date.now() - startTimeRef.current;
+ setElapsedSeconds(Math.floor(elapsed / 1000));
+ if (elapsed >= maxDurationMsRef.current) {
+ clearInterval(timerRef.current!);
+ timerRef.current = null;
+ toast.info(texts.chat.localTranscribe.maxDurationReached);
+ void stopRecordingRef.current();
+ }
+ }, 100);
+ } catch (err) {
+ if (err instanceof Error && err.name === 'NotAllowedError') {
+ toast.error(texts.chat.localTranscribe.microphonePermissionDenied);
+ } else {
+ toast.error(texts.chat.localTranscribe.recordingStartFailed);
+ }
+ setState('error');
+ cleanup();
+ }
+ }, [cleanup]);
+
+ // Store beginRecording in a ref so handleWorkerMessage doesn't depend on it
+ const beginRecordingRef = useRef(beginRecording);
+ useEffect(() => {
+ beginRecordingRef.current = beginRecording;
+ }, [beginRecording]);
+
+ // Forward ref for stopRecording so the auto-stop interval can call it
+ const stopRecordingRef = useRef<() => Promise>(() => Promise.resolve());
+
+ // Worker message handler -- uses refs exclusively for stable identity
+ const handleWorkerMessage = useCallback((event: MessageEvent) => {
+ const data = event.data as Record;
+
+ switch (data.status) {
+ case 'download':
+ case 'initiate':
+ // If we were in 'loading' state (mount pre-load), transition to 'downloading'
+ // to indicate a fresh download is happening (not cached)
+ if (stateRef.current === 'loading') {
+ setState('downloading');
+ }
+ break;
+
+ case 'progress':
+ // Per-file progress -- if we're loading, this means download is in progress
+ if (stateRef.current === 'loading') {
+ setState('downloading');
+ }
+ break;
+
+ case 'progress_total':
+ // Aggregate download progress
+ if (stateRef.current === 'downloading' || stateRef.current === 'loading') {
+ if (stateRef.current === 'loading') {
+ setState('downloading');
+ }
+ setDownloadProgress({
+ loaded: data.loaded as number,
+ total: data.total as number,
+ percentage: data.progress as number,
+ });
+ }
+ break;
+
+ case 'done':
+ // Per-file download complete -- no state change needed
+ break;
+
+ case 'ready':
+ modelLoadedRef.current = true;
+ setDownloadProgress(null);
+
+ if (pendingRecordRef.current) {
+ // User clicked record during download -- auto-start recording
+ pendingRecordRef.current = false;
+ void beginRecordingRef.current();
+ } else {
+ setState('idle');
+ }
+ break;
+
+ case 'result': {
+ const text = (data.text as string) ?? '';
+ if (text.trim() === '') {
+ toast.info(texts.chat.localTranscribe.emptyTranscription);
+ } else {
+ onTranscriptReceivedRef.current(text);
+ }
+ setState('idle');
+ break;
+ }
+
+ case 'silence': {
+ toast.info(texts.chat.localTranscribe.silenceDetected);
+ setState('idle');
+ break;
+ }
+
+ case 'error': {
+ const code = data.code as string | undefined;
+ let message: string;
+
+ switch (code) {
+ case 'download_offline':
+ message = texts.chat.localTranscribe.downloadFailedOffline;
+ break;
+ case 'download_timeout':
+ message = texts.chat.localTranscribe.downloadFailedTimeout;
+ break;
+ case 'download_failed':
+ message = texts.chat.localTranscribe.downloadFailed;
+ break;
+ default:
+ message = (data.error as string) || texts.chat.localTranscribe.downloadFailed;
+ }
+
+ toast.error(message);
+ setState('idle');
+ break;
+ }
+ }
+ }, []);
+
+ // Worker initialization on mount -- model is loaded lazily on first record click
+ useEffect(() => {
+ if (!isSupported) return;
+
+ const worker = new Worker(new URL('../workers/whisper.worker.ts', import.meta.url), { type: 'module' });
+ workerRef.current = worker;
+
+ worker.addEventListener('message', handleWorkerMessage);
+
+ return () => {
+ worker.removeEventListener('message', handleWorkerMessage);
+ worker.terminate();
+ workerRef.current = null;
+ };
+ }, [handleWorkerMessage, isSupported]);
+
+ // Stop recording and send to Worker for transcription
+ const stopRecording = useCallback(async () => {
+ if (!mediaRecorderRef.current || stateRef.current !== 'recording') {
+ return;
+ }
+
+ return new Promise((resolve) => {
+ const recorder = mediaRecorderRef.current!;
+
+ recorder.onstop = async () => {
+ if (audioChunksRef.current.length === 0) {
+ cleanup();
+ toast.error(texts.chat.localTranscribe.noAudioRecorded);
+ setState('idle');
+ resolve();
+ return;
+ }
+
+ // Store chunks before cleanup
+ const audioChunks = [...audioChunksRef.current];
+
+ // Stop timer and stream
+ cleanup();
+
+ setState('transcribing');
+
+ try {
+ const audioBlob = new Blob(audioChunks, { type: 'audio/webm' });
+ const audioData = await resampleToMono16kHz(audioBlob);
+
+ // Transfer audio to Worker with Transferable (zero-copy)
+ const worker = workerRef.current;
+ if (!worker) {
+ setState('idle');
+ resolve();
+ return;
+ }
+ worker.postMessage({ type: 'transcribe', audio: audioData, language: languageRef.current }, [audioData.buffer]);
+ } catch {
+ toast.error(texts.chat.localTranscribe.transcriptionFailed);
+ setState('error');
+ }
+
+ resolve();
+ };
+
+ // Request any remaining data before stopping
+ if (recorder.state === 'recording') {
+ recorder.requestData();
+ recorder.stop();
+ } else {
+ // MediaRecorder is already inactive -- resolve immediately
+ cleanup();
+ setState('idle');
+ resolve();
+ }
+ });
+ }, [cleanup]);
+
+ useEffect(() => {
+ stopRecordingRef.current = stopRecording;
+ }, [stopRecording]);
+
+ // Start recording
+ const startRecording = useCallback(async () => {
+ if (stateRef.current !== 'idle' && stateRef.current !== 'error') {
+ return;
+ }
+
+ if (!modelLoadedRef.current) {
+ // Check mic permission BEFORE starting model download
+ try {
+ const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
+ stream.getTracks().forEach((track) => track.stop());
+ } catch (err) {
+ if (err instanceof Error && err.name === 'NotAllowedError') {
+ toast.error(texts.chat.localTranscribe.microphonePermissionDenied);
+ } else {
+ toast.error(texts.chat.localTranscribe.recordingStartFailed);
+ }
+ setState('idle');
+ return;
+ }
+
+ // Mic available -- trigger download and set pending
+ const worker = workerRef.current;
+ if (!worker) {
+ pendingRecordRef.current = false;
+ setState('idle');
+ return;
+ }
+ pendingRecordRef.current = true;
+ setState('downloading');
+ worker.postMessage({ type: 'load' });
+ return;
+ }
+
+ // Model loaded -- start recording immediately
+ await beginRecording();
+ }, [beginRecording]);
+
+ // Toggle recording
+ const toggleRecording = useCallback(async () => {
+ if (stateRef.current === 'idle' || stateRef.current === 'error') {
+ await startRecording();
+ } else if (stateRef.current === 'recording') {
+ await stopRecording();
+ }
+ // Do nothing for 'downloading', 'loading', 'transcribing'
+ }, [startRecording, stopRecording]);
+
+ // Cancel an in-progress model download
+ const cancelDownload = useCallback(() => {
+ if (stateRef.current !== 'downloading') return;
+
+ // Terminate current worker
+ if (workerRef.current) {
+ workerRef.current.removeEventListener('message', handleWorkerMessage);
+ workerRef.current.terminate();
+ workerRef.current = null;
+ }
+
+ // Reset state
+ pendingRecordRef.current = false;
+ modelLoadedRef.current = false;
+ setDownloadProgress(null);
+ setState('idle');
+ toast.info(texts.chat.localTranscribe.downloadCancelled);
+
+ // Create fresh worker for future use
+ const worker = new Worker(new URL('../workers/whisper.worker.ts', import.meta.url), { type: 'module' });
+ workerRef.current = worker;
+ worker.addEventListener('message', handleWorkerMessage);
+ }, [handleWorkerMessage]);
+
+ // Cleanup MediaRecorder on unmount
+ useEffect(() => {
+ return () => {
+ // Stop recorder BEFORE cleanup to preserve proper event ordering
+ if (mediaRecorderRef.current && mediaRecorderRef.current.state === 'recording') {
+ mediaRecorderRef.current.stop();
+ }
+ cleanup();
+ };
+ }, [cleanup]);
+
+ return {
+ state,
+ downloadProgress,
+ isSupported,
+ isRecording: state === 'recording',
+ isTranscribing: state === 'transcribing',
+ isDownloading: state === 'downloading',
+ toggleRecording,
+ cancelDownload,
+ elapsedSeconds,
+ };
+}
diff --git a/frontend/src/hooks/useLocalTranscribe.ui-unit.spec.ts b/frontend/src/hooks/useLocalTranscribe.ui-unit.spec.ts
new file mode 100644
index 000000000..025969768
--- /dev/null
+++ b/frontend/src/hooks/useLocalTranscribe.ui-unit.spec.ts
@@ -0,0 +1,687 @@
+import { act, renderHook } from '@testing-library/react';
+import { toast } from 'react-toastify';
+import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest';
+import { resampleToMono16kHz } from 'src/lib/audio-utils';
+
+// Mock audio-utils
+vi.mock('src/lib/audio-utils', () => ({
+ resampleToMono16kHz: vi.fn().mockResolvedValue(new Float32Array(16000)),
+}));
+
+// Mock react-toastify
+vi.mock('react-toastify', () => ({
+ toast: { error: vi.fn(), info: vi.fn() },
+}));
+
+// Mock texts
+vi.mock('src/texts', () => ({
+ texts: {
+ chat: {
+ localTranscribe: {
+ maxDurationReached: 'Maximum recording duration reached. Transcribing audio...',
+ microphonePermissionDenied: 'Microphone permission denied.',
+ recordingStartFailed: 'Failed to start recording.',
+ noAudioRecorded: 'No audio was recorded.',
+ transcriptionFailed: 'Local transcription failed.',
+ downloadFailed: 'Failed to download speech recognition model.',
+ loadFailed: 'Failed to load speech recognition model.',
+ downloadFailedOffline: 'No internet connection.',
+ downloadFailedTimeout: 'Download timed out.',
+ downloadCancelled: 'Download cancelled.',
+ emptyTranscription: 'No speech could be recognized.',
+ silenceDetected: 'No speech detected.',
+ },
+ },
+ },
+}));
+
+// --- Worker mock infrastructure ---
+interface MockWorker {
+ postMessage: ReturnType;
+ addEventListener: ReturnType;
+ removeEventListener: ReturnType;
+ terminate: ReturnType;
+ messageHandler: ((event: MessageEvent) => void) | null;
+}
+
+let mockWorkerInstance: MockWorker;
+
+function simulateWorkerMessage(data: Record) {
+ if (mockWorkerInstance.messageHandler) {
+ mockWorkerInstance.messageHandler({ data } as MessageEvent);
+ }
+}
+
+// Worker class mock -- each instance IS the mockWorkerInstance
+class MockWorkerClass {
+ postMessage: ReturnType;
+ addEventListener: ReturnType;
+ removeEventListener: ReturnType;
+ terminate: ReturnType;
+
+ constructor() {
+ this.postMessage = vi.fn();
+ this.terminate = vi.fn();
+ this.removeEventListener = vi.fn();
+ this.addEventListener = vi.fn((event: string, handler: (event: MessageEvent) => void) => {
+ if (event === 'message') {
+ mockWorkerInstance.messageHandler = handler;
+ }
+ });
+ // Point the global reference to this instance
+ mockWorkerInstance = {
+ postMessage: this.postMessage,
+ addEventListener: this.addEventListener,
+ removeEventListener: this.removeEventListener,
+ terminate: this.terminate,
+ messageHandler: null,
+ };
+ }
+}
+
+vi.stubGlobal('Worker', MockWorkerClass);
+
+// --- MediaRecorder mock infrastructure ---
+interface MockMediaRecorder {
+ state: string;
+ ondataavailable: ((event: { data: Blob }) => void) | null;
+ onstop: (() => void) | null;
+ onerror: ((event: Event) => void) | null;
+ start: ReturnType;
+ stop: ReturnType;
+ requestData: ReturnType;
+}
+
+let mockMediaRecorderInstance: MockMediaRecorder;
+
+class MockMediaRecorderClass {
+ state: string;
+ ondataavailable: ((event: { data: Blob }) => void) | null;
+ onstop: (() => void) | null;
+ onerror: ((event: Event) => void) | null;
+ start: ReturnType;
+ stop: ReturnType;
+ requestData: ReturnType;
+
+ constructor() {
+ this.state = 'inactive';
+ this.ondataavailable = null;
+ this.onstop = null;
+ this.onerror = null;
+
+ // Point global reference to this instance FIRST so mockImplementation closures capture it
+ // eslint-disable-next-line @typescript-eslint/no-this-alias
+ mockMediaRecorderInstance = this;
+
+ this.start = vi.fn().mockImplementation(() => {
+ mockMediaRecorderInstance.state = 'recording';
+ });
+ this.stop = vi.fn().mockImplementation(() => {
+ mockMediaRecorderInstance.state = 'inactive';
+ if (mockMediaRecorderInstance.onstop) {
+ mockMediaRecorderInstance.onstop();
+ }
+ });
+ this.requestData = vi.fn();
+ }
+}
+
+vi.stubGlobal('MediaRecorder', MockMediaRecorderClass);
+
+// --- Mock stream ---
+const mockTrackStop = vi.fn();
+const mockStream = {
+ getTracks: () => [{ stop: mockTrackStop }],
+};
+
+// Override navigator.mediaDevices.getUserMedia
+const mockGetUserMedia = vi.fn().mockResolvedValue(mockStream);
+Object.defineProperty(navigator, 'mediaDevices', {
+ value: { getUserMedia: mockGetUserMedia },
+ writable: true,
+ configurable: true,
+});
+
+// Import the hook after mocks are set up
+import { useLocalTranscribe } from './useLocalTranscribe';
+
+// Helper to simulate audio data arriving on the MediaRecorder
+function simulateAudioData() {
+ if (mockMediaRecorderInstance.ondataavailable) {
+ mockMediaRecorderInstance.ondataavailable({ data: new Blob(['audio'], { type: 'audio/webm' }) });
+ }
+}
+
+describe('useLocalTranscribe', () => {
+ beforeEach(() => {
+ vi.clearAllMocks();
+ vi.useFakeTimers();
+ // Stub browser capabilities for isSupported check (default: all supported)
+ vi.stubGlobal('WebAssembly', {});
+ vi.stubGlobal('crossOriginIsolated', true);
+ });
+
+ afterEach(() => {
+ vi.useRealTimers();
+ });
+
+ const defaultProps = {
+ language: 'de',
+ onTranscriptReceived: vi.fn(),
+ };
+
+ // Test 1: Initial state
+ it('starts in idle state with downloadProgress null', () => {
+ const { result } = renderHook(() => useLocalTranscribe(defaultProps));
+
+ // Hook starts in idle state with lazy loading (no pre-load on mount)
+ expect(result.current.state).toBe('idle');
+ expect(result.current.downloadProgress).toBeNull();
+ expect(result.current.isRecording).toBe(false);
+ expect(result.current.isTranscribing).toBe(false);
+ expect(result.current.isDownloading).toBe(false);
+ expect(result.current.isSupported).toBe(true);
+ });
+
+ // Test 2: Worker creation on mount (lazy loading - no load message)
+ it('creates Worker on mount and becomes idle on ready', () => {
+ const { result } = renderHook(() => useLocalTranscribe(defaultProps));
+
+ // Worker created but no load message posted (lazy loading)
+ expect(mockWorkerInstance.addEventListener).toHaveBeenCalledWith('message', expect.any(Function));
+
+ act(() => {
+ simulateWorkerMessage({ status: 'ready' });
+ });
+
+ expect(result.current.state).toBe('idle');
+ });
+
+ // Test 3: First click when model not loaded (D-04)
+ it('posts load to Worker on first click when model not loaded, auto-starts recording on ready', async () => {
+ const { result } = renderHook(() => useLocalTranscribe(defaultProps));
+
+ // Send error so hook goes to 'idle' state (model not loaded, error -> idle per D-04)
+ act(() => {
+ simulateWorkerMessage({ status: 'error', error: 'Load failed' });
+ });
+
+ // After error, state is now idle (not error) per D-04/Phase 3 D-13
+ expect(result.current.state).toBe('idle');
+
+ // Now click toggleRecording -- model is not loaded, should set pending and post 'load'
+ await act(async () => {
+ await result.current.toggleRecording();
+ });
+
+ expect(result.current.state).toBe('downloading');
+ expect(mockWorkerInstance.postMessage).toHaveBeenCalledWith({ type: 'load' });
+
+ // Simulate ready -- should auto-start recording (beginRecording is async, needs async act)
+ await act(async () => {
+ simulateWorkerMessage({ status: 'ready' });
+ // Allow microtask (getUserMedia promise) to settle
+ await vi.waitFor(() => undefined);
+ });
+
+ expect(result.current.state).toBe('recording');
+ expect(mockGetUserMedia).toHaveBeenCalledWith({ audio: true });
+ });
+
+ // Test 4: Click when model already loaded
+ it('goes directly to recording state when model already loaded', async () => {
+ const { result } = renderHook(() => useLocalTranscribe(defaultProps));
+
+ // Model loaded
+ act(() => {
+ simulateWorkerMessage({ status: 'ready' });
+ });
+ expect(result.current.state).toBe('idle');
+
+ // Click record
+ await act(async () => {
+ await result.current.toggleRecording();
+ });
+
+ expect(result.current.state).toBe('recording');
+ expect(mockGetUserMedia).toHaveBeenCalledWith({ audio: true });
+ });
+
+ // Test 5: Download progress (D-08)
+ it('updates downloadProgress on progress_total message', async () => {
+ const { result } = renderHook(() => useLocalTranscribe(defaultProps));
+
+ // Click record to trigger model download (state -> downloading)
+ await act(async () => {
+ await result.current.toggleRecording();
+ });
+
+ expect(result.current.state).toBe('downloading');
+
+ act(() => {
+ simulateWorkerMessage({ status: 'progress_total', name: 'model', progress: 50, loaded: 50, total: 100 });
+ });
+
+ expect(result.current.downloadProgress).toEqual({
+ loaded: 50,
+ total: 100,
+ percentage: 50,
+ });
+ });
+
+ // Test 6: Stop recording + transcribe
+ it('stops recording, resamples audio, and posts transcribe to Worker', async () => {
+ const { result } = renderHook(() => useLocalTranscribe(defaultProps));
+
+ // Load model
+ act(() => {
+ simulateWorkerMessage({ status: 'ready' });
+ });
+
+ // Start recording
+ await act(async () => {
+ await result.current.toggleRecording();
+ });
+
+ // Simulate audio data via the hook's ondataavailable handler (set on the instance)
+ act(() => {
+ simulateAudioData();
+ });
+
+ // Stop recording
+ await act(async () => {
+ await result.current.toggleRecording();
+ });
+
+ // Should have called resampleToMono16kHz
+ expect(resampleToMono16kHz).toHaveBeenCalled();
+
+ // Should have posted transcribe message to Worker
+ expect(mockWorkerInstance.postMessage).toHaveBeenCalledWith(
+ expect.objectContaining({
+ type: 'transcribe',
+ language: 'de',
+ }),
+ expect.any(Array),
+ );
+
+ expect(result.current.state).toBe('transcribing');
+ });
+
+ // Test 7: Transcription result (D-10)
+ it('calls onTranscriptReceived and sets idle on result', async () => {
+ const onTranscriptReceived = vi.fn();
+ const { result } = renderHook(() => useLocalTranscribe({ ...defaultProps, onTranscriptReceived }));
+
+ // Load model
+ act(() => {
+ simulateWorkerMessage({ status: 'ready' });
+ });
+
+ // Start recording
+ await act(async () => {
+ await result.current.toggleRecording();
+ });
+
+ // Simulate audio
+ act(() => {
+ simulateAudioData();
+ });
+
+ // Stop recording
+ await act(async () => {
+ await result.current.toggleRecording();
+ });
+
+ // Simulate result from Worker
+ act(() => {
+ simulateWorkerMessage({ status: 'result', text: 'hello world' });
+ });
+
+ expect(onTranscriptReceived).toHaveBeenCalledWith('hello world');
+ expect(result.current.state).toBe('idle');
+ });
+
+ // Test 8: Auto-stop at 2 minutes (D-11)
+ it('auto-stops recording after maxDurationMs and shows toast', async () => {
+ const { result } = renderHook(() => useLocalTranscribe({ ...defaultProps, maxDurationMs: 120000 }));
+
+ // Load model
+ act(() => {
+ simulateWorkerMessage({ status: 'ready' });
+ });
+
+ // Start recording
+ await act(async () => {
+ await result.current.toggleRecording();
+ });
+
+ expect(result.current.state).toBe('recording');
+
+ // Simulate audio data before auto-stop
+ act(() => {
+ simulateAudioData();
+ });
+
+ // Advance time past 2 minutes
+ act(() => {
+ vi.advanceTimersByTime(120100);
+ });
+
+ expect(toast.info).toHaveBeenCalledWith('Maximum recording duration reached. Transcribing audio...');
+ });
+
+ // Test 9: Transferable transfer (AUDIO-03)
+ it('posts transcribe message with Transferable transfer list', async () => {
+ const mockAudioData = new Float32Array(16000);
+ vi.mocked(resampleToMono16kHz).mockResolvedValue(mockAudioData);
+
+ const { result } = renderHook(() => useLocalTranscribe(defaultProps));
+
+ // Load model
+ act(() => {
+ simulateWorkerMessage({ status: 'ready' });
+ });
+
+ // Start recording
+ await act(async () => {
+ await result.current.toggleRecording();
+ });
+
+ // Simulate audio
+ act(() => {
+ simulateAudioData();
+ });
+
+ // Stop recording
+ await act(async () => {
+ await result.current.toggleRecording();
+ });
+
+ // Find the transcribe call
+ const transcribeCall = mockWorkerInstance.postMessage.mock.calls.find(
+ (call: unknown[]) => (call[0] as Record).type === 'transcribe',
+ );
+ expect(transcribeCall).toBeDefined();
+ // Second argument should be the transfer list with the ArrayBuffer
+ expect(transcribeCall![1]).toEqual([mockAudioData.buffer]);
+ });
+
+ // Test 10: Language parameter (D-09)
+ it('passes language parameter to Worker transcribe message', async () => {
+ const { result } = renderHook(() => useLocalTranscribe({ ...defaultProps, language: 'en' }));
+
+ // Load model
+ act(() => {
+ simulateWorkerMessage({ status: 'ready' });
+ });
+
+ // Record and stop
+ await act(async () => {
+ await result.current.toggleRecording();
+ });
+
+ act(() => {
+ simulateAudioData();
+ });
+
+ await act(async () => {
+ await result.current.toggleRecording();
+ });
+
+ const transcribeCall = mockWorkerInstance.postMessage.mock.calls.find(
+ (call: unknown[]) => (call[0] as Record).type === 'transcribe',
+ );
+ expect(transcribeCall).toBeDefined();
+ expect((transcribeCall![0] as Record).language).toBe('en');
+ });
+
+ // Test 11: Error from Worker (with error code)
+ it('sets idle state and shows toast on Worker error with code', () => {
+ const { result } = renderHook(() => useLocalTranscribe(defaultProps));
+
+ act(() => {
+ simulateWorkerMessage({ status: 'error', error: 'Network error', code: 'download_offline' });
+ });
+
+ expect(result.current.state).toBe('idle');
+ expect(toast.error).toHaveBeenCalledWith('No internet connection.');
+ });
+
+ // Test 12: Cleanup on unmount
+ it('terminates Worker and cleans up on unmount', () => {
+ const { unmount } = renderHook(() => useLocalTranscribe(defaultProps));
+
+ // Load model
+ act(() => {
+ simulateWorkerMessage({ status: 'ready' });
+ });
+
+ unmount();
+
+ expect(mockWorkerInstance.terminate).toHaveBeenCalled();
+ expect(mockWorkerInstance.removeEventListener).toHaveBeenCalledWith('message', expect.any(Function));
+ });
+
+ // Test 13: Download blocks recording (D-05)
+ it('does not allow recording during downloading state', async () => {
+ const { result } = renderHook(() => useLocalTranscribe(defaultProps));
+
+ // Trigger download
+ await act(async () => {
+ await result.current.toggleRecording();
+ });
+ expect(result.current.state).toBe('downloading');
+
+ // Try to toggle again -- should be a no-op (D-05)
+ await act(async () => {
+ await result.current.toggleRecording();
+ });
+ expect(result.current.state).toBe('downloading');
+ });
+
+ // Test 14: isSupported false when Worker missing (ERR-02)
+ it('returns isSupported=false when Worker is not available', () => {
+ const origWorker = globalThis.Worker;
+ // @ts-expect-error -- testing missing API
+ delete globalThis.Worker;
+
+ const { result } = renderHook(() => useLocalTranscribe(defaultProps));
+ expect(result.current.isSupported).toBe(false);
+
+ globalThis.Worker = origWorker;
+ });
+
+ // Test 15: isSupported false when crossOriginIsolated is false (ERR-02)
+ it('returns isSupported=false when crossOriginIsolated is false', () => {
+ vi.stubGlobal('crossOriginIsolated', false);
+
+ const { result } = renderHook(() => useLocalTranscribe(defaultProps));
+ expect(result.current.isSupported).toBe(false);
+ });
+
+ // Test 16: no Worker created when isSupported=false (ERR-02)
+ it('does not create Worker when isSupported is false', () => {
+ vi.stubGlobal('crossOriginIsolated', false);
+
+ renderHook(() => useLocalTranscribe(defaultProps));
+ // Worker constructor should not have been called for the hook
+ // (the mock resets between tests, so postMessage should not have been called)
+ expect(mockWorkerInstance?.postMessage || vi.fn()).not.toHaveBeenCalled();
+ });
+
+ // Test 17: download timeout error mapping (ERR-03)
+ it('maps download_timeout error code to timeout i18n message', () => {
+ const { result } = renderHook(() => useLocalTranscribe(defaultProps));
+
+ act(() => {
+ simulateWorkerMessage({ status: 'error', error: 'Timed out', code: 'download_timeout' });
+ });
+
+ expect(result.current.state).toBe('idle');
+ expect(toast.error).toHaveBeenCalledWith('Download timed out.');
+ });
+
+ // Test 18: download generic error mapping (ERR-03)
+ it('maps download_failed error code to generic download i18n message', () => {
+ const { result } = renderHook(() => useLocalTranscribe(defaultProps));
+
+ act(() => {
+ simulateWorkerMessage({ status: 'error', error: 'Unknown', code: 'download_failed' });
+ });
+
+ expect(result.current.state).toBe('idle');
+ expect(toast.error).toHaveBeenCalledWith('Failed to download speech recognition model.');
+ });
+
+ // Test 19: unknown error code falls back to raw message (ERR-03)
+ it('falls back to raw error message for unknown error codes', () => {
+ const { result } = renderHook(() => useLocalTranscribe(defaultProps));
+
+ act(() => {
+ simulateWorkerMessage({ status: 'error', error: 'Something unexpected' });
+ });
+
+ expect(result.current.state).toBe('idle');
+ expect(toast.error).toHaveBeenCalledWith('Something unexpected');
+ });
+
+ // Test 20: empty transcription shows toast.info (ERR-04)
+ it('shows toast.info and does not insert text for empty transcription', () => {
+ const onTranscriptReceived = vi.fn();
+ const { result } = renderHook(() => useLocalTranscribe({ ...defaultProps, onTranscriptReceived }));
+
+ act(() => {
+ simulateWorkerMessage({ status: 'result', text: '' });
+ });
+
+ expect(result.current.state).toBe('idle');
+ expect(toast.info).toHaveBeenCalledWith('No speech could be recognized.');
+ expect(onTranscriptReceived).not.toHaveBeenCalled();
+ });
+
+ // Test 21: whitespace-only transcription shows toast.info (ERR-04)
+ it('shows toast.info for whitespace-only transcription', () => {
+ const onTranscriptReceived = vi.fn();
+ const { result } = renderHook(() => useLocalTranscribe({ ...defaultProps, onTranscriptReceived }));
+
+ act(() => {
+ simulateWorkerMessage({ status: 'result', text: ' \n ' });
+ });
+
+ expect(result.current.state).toBe('idle');
+ expect(toast.info).toHaveBeenCalledWith('No speech could be recognized.');
+ expect(onTranscriptReceived).not.toHaveBeenCalled();
+ });
+
+ // Test 22: valid transcription still works (regression)
+ it('inserts text for non-empty transcription result', () => {
+ const onTranscriptReceived = vi.fn();
+ const { result } = renderHook(() => useLocalTranscribe({ ...defaultProps, onTranscriptReceived }));
+
+ act(() => {
+ simulateWorkerMessage({ status: 'result', text: 'Hello world' });
+ });
+
+ expect(result.current.state).toBe('idle');
+ expect(onTranscriptReceived).toHaveBeenCalledWith('Hello world');
+ expect(toast.info).not.toHaveBeenCalled();
+ });
+
+ // Test 23: mic denied prevents model download
+ it('does not start model download when mic permission is denied', async () => {
+ mockGetUserMedia.mockRejectedValueOnce(Object.assign(new Error('Permission denied'), { name: 'NotAllowedError' }));
+
+ const { result } = renderHook(() => useLocalTranscribe(defaultProps));
+
+ await act(async () => {
+ await result.current.toggleRecording();
+ });
+
+ expect(result.current.state).toBe('idle');
+ expect(toast.error).toHaveBeenCalledWith('Microphone permission denied.');
+ expect(mockWorkerInstance.postMessage).not.toHaveBeenCalledWith({ type: 'load' });
+ });
+
+ // Test 24: cancel download shows toast.info (D-06)
+ it('shows toast.info when download is cancelled', async () => {
+ const { result } = renderHook(() => useLocalTranscribe(defaultProps));
+
+ // Start download
+ await act(async () => {
+ await result.current.toggleRecording();
+ });
+ expect(result.current.state).toBe('downloading');
+
+ // Cancel
+ act(() => {
+ result.current.cancelDownload();
+ });
+
+ expect(result.current.state).toBe('idle');
+ expect(toast.info).toHaveBeenCalledWith('Download cancelled.');
+ });
+
+ describe('elapsed seconds', () => {
+ it('should expose elapsedSeconds initially as 0', () => {
+ const { result } = renderHook(() => useLocalTranscribe(defaultProps));
+ expect(result.current.elapsedSeconds).toBe(0);
+ });
+
+ it('should update elapsedSeconds during recording', async () => {
+ const { result } = renderHook(() => useLocalTranscribe(defaultProps));
+
+ // Model loaded
+ act(() => {
+ simulateWorkerMessage({ status: 'ready' });
+ });
+
+ // Start recording
+ await act(async () => {
+ await result.current.toggleRecording();
+ });
+
+ expect(result.current.state).toBe('recording');
+
+ // Advance timer by 3 seconds (3000ms)
+ act(() => {
+ vi.advanceTimersByTime(3000);
+ });
+
+ expect(result.current.elapsedSeconds).toBeGreaterThanOrEqual(2);
+ });
+ });
+
+ describe('silence status handling', () => {
+ it('should show toast.info on silence status', () => {
+ renderHook(() => useLocalTranscribe(defaultProps));
+
+ act(() => {
+ simulateWorkerMessage({ status: 'silence' });
+ });
+
+ expect(toast.info).toHaveBeenCalledWith(expect.stringContaining('No speech detected'));
+ });
+
+ it('should return to idle state on silence status', () => {
+ const { result } = renderHook(() => useLocalTranscribe(defaultProps));
+
+ act(() => {
+ simulateWorkerMessage({ status: 'silence' });
+ });
+
+ expect(result.current.state).toBe('idle');
+ });
+
+ it('should NOT call onTranscriptReceived on silence status', () => {
+ const onTranscriptReceived = vi.fn();
+ renderHook(() => useLocalTranscribe({ ...defaultProps, onTranscriptReceived }));
+
+ act(() => {
+ simulateWorkerMessage({ status: 'silence' });
+ });
+
+ expect(onTranscriptReceived).not.toHaveBeenCalled();
+ });
+ });
+});
diff --git a/frontend/src/hooks/useTranscribe.ts b/frontend/src/hooks/useTranscribe.ts
index 47dd025fa..613c766d7 100644
--- a/frontend/src/hooks/useTranscribe.ts
+++ b/frontend/src/hooks/useTranscribe.ts
@@ -153,7 +153,7 @@ export function useTranscribe({ extensionId, onTranscriptReceived, maxDurationMs
startTimeRef.current = Date.now();
// Start duration timer
- timerRef.current = setInterval(() => {
+ timerRef.current = window.setInterval(() => {
const elapsed = Date.now() - startTimeRef.current;
// Auto-stop if max duration reached
diff --git a/frontend/src/lib/audio-utils.ts b/frontend/src/lib/audio-utils.ts
new file mode 100644
index 000000000..e186523b1
--- /dev/null
+++ b/frontend/src/lib/audio-utils.ts
@@ -0,0 +1,23 @@
+/** Resamples an audio Blob to 16kHz mono Float32Array for Whisper inference. */
+export async function resampleToMono16kHz(audioBlob: Blob): Promise {
+ const audioContext = new AudioContext();
+
+ try {
+ const arrayBuffer = await audioBlob.arrayBuffer();
+ const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
+
+ const targetSampleRate = 16000;
+ const numSamples = Math.ceil(audioBuffer.duration * targetSampleRate);
+
+ const offlineCtx = new OfflineAudioContext(1, numSamples, targetSampleRate);
+ const source = offlineCtx.createBufferSource();
+ source.buffer = audioBuffer;
+ source.connect(offlineCtx.destination);
+ source.start(0);
+
+ const renderedBuffer = await offlineCtx.startRendering();
+ return renderedBuffer.getChannelData(0).slice();
+ } finally {
+ await audioContext.close();
+ }
+}
diff --git a/frontend/src/lib/audio-utils.ui-unit.spec.ts b/frontend/src/lib/audio-utils.ui-unit.spec.ts
new file mode 100644
index 000000000..a926cf3ff
--- /dev/null
+++ b/frontend/src/lib/audio-utils.ui-unit.spec.ts
@@ -0,0 +1,158 @@
+import { beforeEach, describe, expect, it, vi } from 'vitest';
+
+// Mock audio data
+const mockChannelData = new Float32Array([0.1, 0.2, 0.3, 0.4, 0.5]);
+
+// Track constructor arguments
+let capturedOfflineCtxArgs: unknown[] = [];
+
+// Mock AudioBuffer
+const mockRenderedBuffer = {
+ getChannelData: vi.fn().mockReturnValue(mockChannelData),
+};
+
+// Mock source node
+const mockSource = {
+ buffer: null as AudioBuffer | null,
+ connect: vi.fn(),
+ start: vi.fn(),
+};
+
+// Mock OfflineAudioContext
+const mockOfflineCtx = {
+ createBufferSource: vi.fn().mockReturnValue(mockSource),
+ destination: {},
+ startRendering: vi.fn().mockResolvedValue(mockRenderedBuffer),
+};
+
+// Mock AudioBuffer from decodeAudioData
+const mockDecodedBuffer = {
+ duration: 2.5,
+ sampleRate: 44100,
+ numberOfChannels: 2,
+};
+
+// Mock AudioContext
+const mockAudioContextClose = vi.fn().mockResolvedValue(undefined);
+const mockDecodeAudioData = vi.fn().mockResolvedValue(mockDecodedBuffer);
+
+vi.stubGlobal(
+ 'AudioContext',
+ class MockAudioContext {
+ decodeAudioData = mockDecodeAudioData;
+ close = mockAudioContextClose;
+ },
+);
+
+vi.stubGlobal(
+ 'OfflineAudioContext',
+ class MockOfflineAudioContext {
+ createBufferSource = mockOfflineCtx.createBufferSource;
+ destination = mockOfflineCtx.destination;
+ startRendering = mockOfflineCtx.startRendering;
+
+ constructor(...args: unknown[]) {
+ capturedOfflineCtxArgs = args;
+ }
+ },
+);
+
+// Helper: create a mock Blob with arrayBuffer() support (jsdom Blob lacks it)
+function createMockBlob(): Blob {
+ const blob = new Blob(['test-audio-data'], { type: 'audio/webm' });
+ // jsdom Blob does not implement arrayBuffer(), so we polyfill it
+ if (!blob.arrayBuffer) {
+ blob.arrayBuffer = () => Promise.resolve(new ArrayBuffer(8));
+ }
+ return blob;
+}
+
+describe('resampleToMono16kHz', () => {
+ beforeEach(() => {
+ vi.clearAllMocks();
+ capturedOfflineCtxArgs = [];
+ mockDecodeAudioData.mockResolvedValue(mockDecodedBuffer);
+ mockOfflineCtx.startRendering.mockResolvedValue(mockRenderedBuffer);
+ mockRenderedBuffer.getChannelData.mockReturnValue(mockChannelData);
+ mockAudioContextClose.mockResolvedValue(undefined);
+ });
+
+ it('returns a Float32Array', async () => {
+ const { resampleToMono16kHz } = await import('./audio-utils');
+ const blob = createMockBlob();
+
+ const result = await resampleToMono16kHz(blob);
+
+ expect(result).toBeInstanceOf(Float32Array);
+ });
+
+ it('creates OfflineAudioContext with 1 channel, correct sample count, and 16000 Hz', async () => {
+ const { resampleToMono16kHz } = await import('./audio-utils');
+ const blob = createMockBlob();
+
+ await resampleToMono16kHz(blob);
+
+ // numSamples = Math.ceil(2.5 * 16000) = 40000
+ expect(capturedOfflineCtxArgs).toEqual([1, 40000, 16000]);
+ });
+
+ it('computes numSamples as ceil(duration * 16000)', async () => {
+ // Set a duration that requires ceiling
+ mockDecodeAudioData.mockResolvedValue({
+ ...mockDecodedBuffer,
+ duration: 1.00001,
+ });
+
+ const { resampleToMono16kHz } = await import('./audio-utils');
+ const blob = createMockBlob();
+
+ await resampleToMono16kHz(blob);
+
+ // numSamples = Math.ceil(1.00001 * 16000) = Math.ceil(16000.16) = 16001
+ expect(capturedOfflineCtxArgs[1]).toBe(Math.ceil(1.00001 * 16000));
+ });
+
+ it('calls AudioContext.close() in finally block even after success', async () => {
+ const { resampleToMono16kHz } = await import('./audio-utils');
+ const blob = createMockBlob();
+
+ await resampleToMono16kHz(blob);
+
+ expect(mockAudioContextClose).toHaveBeenCalledTimes(1);
+ });
+
+ it('calls AudioContext.close() in finally block even after error', async () => {
+ mockDecodeAudioData.mockRejectedValue(new Error('Decode failed'));
+
+ const { resampleToMono16kHz } = await import('./audio-utils');
+ const blob = createMockBlob();
+
+ await expect(resampleToMono16kHz(blob)).rejects.toThrow('Decode failed');
+ expect(mockAudioContextClose).toHaveBeenCalledTimes(1);
+ });
+
+ it('returns a slice copy, not a reference to the rendered buffer channel data', async () => {
+ const originalData = new Float32Array([1.0, 2.0, 3.0]);
+ mockRenderedBuffer.getChannelData.mockReturnValue(originalData);
+
+ const { resampleToMono16kHz } = await import('./audio-utils');
+ const blob = createMockBlob();
+
+ const result = await resampleToMono16kHz(blob);
+
+ // Should be a different Float32Array instance (via .slice())
+ expect(result).not.toBe(originalData);
+ // But same values
+ expect(Array.from(result)).toEqual(Array.from(originalData));
+ });
+
+ it('connects source to destination and starts playback', async () => {
+ const { resampleToMono16kHz } = await import('./audio-utils');
+ const blob = createMockBlob();
+
+ await resampleToMono16kHz(blob);
+
+ expect(mockSource.connect).toHaveBeenCalled();
+ expect(mockSource.start).toHaveBeenCalledWith(0);
+ });
+});
diff --git a/frontend/src/pages/chat/conversation/ChatInput.tsx b/frontend/src/pages/chat/conversation/ChatInput.tsx
index 926c004fd..874084479 100644
--- a/frontend/src/pages/chat/conversation/ChatInput.tsx
+++ b/frontend/src/pages/chat/conversation/ChatInput.tsx
@@ -6,9 +6,14 @@ import { ConfigurationDto, FileDto } from 'src/api';
import { Markdown } from 'src/components';
import { ExtensionContext, JSONObject, useEventCallback, useExtensionContext, usePersistentState, useTheme } from 'src/hooks';
import { useSpeechRecognitionToggle } from 'src/hooks/useSpeechRecognitionToggle';
+import { useLocalTranscribe } from 'src/hooks/useLocalTranscribe';
import { useTranscribe } from 'src/hooks/useTranscribe';
+import { DownloadProgressBanner } from './DownloadProgressBanner';
import { FileItemComponent } from 'src/pages/chat/conversation/FileItem';
import { FilterModal } from 'src/pages/chat/conversation/FilterModal';
+import { LocalTranscribeButton } from './LocalTranscribeButton';
+import { PrivacyBadge } from './PrivacyBadge';
+import { RecordingTimer } from './RecordingTimer';
import { Language, SpeechRecognitionButton } from 'src/pages/chat/conversation/SpeechRecognitionButton';
import { TranscribeButton } from 'src/pages/chat/conversation/TranscribeButton';
import { texts } from 'src/texts';
@@ -59,6 +64,8 @@ export function ChatInput({ textareaRef, chatId, configuration, isDisabled, isEm
speechRecognitionLanguages[0].code,
);
+ const [localTranscribeLanguage, setLocalTranscribeLanguage] = useState('de');
+
useEffect(() => {
const defaultValues = configuration?.extensions?.filter(isExtensionWithUserArgs).reduce(
(prev, extension) => {
@@ -177,10 +184,13 @@ export function ChatInput({ textareaRef, chatId, configuration, isDisabled, isEm
});
const voiceExtensions =
- configuration?.extensions?.filter((e) => e.name === 'speech-to-text' || e.name === 'transcribe-azure') ?? [];
+ configuration?.extensions?.filter(
+ (e) => e.name === 'speech-to-text' || e.name === 'transcribe-azure' || e.name === 'transcribe-local',
+ ) ?? [];
const activeVoiceExtension = voiceExtensions[0];
const showSpeechToText = activeVoiceExtension?.name === 'speech-to-text';
const showTranscribe = activeVoiceExtension?.name === 'transcribe-azure';
+ const showLocalTranscribe = activeVoiceExtension?.name === 'transcribe-local';
// Transcribe extension setup
const transcribeExtension = showTranscribe ? activeVoiceExtension : undefined;
@@ -190,6 +200,11 @@ export function ChatInput({ textareaRef, chatId, configuration, isDisabled, isEm
});
const { isRecording, isTranscribing, toggleRecording } = transcribeHook;
+ const localTranscribeHook = useLocalTranscribe({
+ language: localTranscribeLanguage,
+ onTranscriptReceived: setInput,
+ });
+
return (
<>
@@ -230,6 +245,13 @@ export function ChatInput({ textareaRef, chatId, configuration, isDisabled, isEm
)}