Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
c09a216
feat(gateway): add speech-to-text dictation proxy via Deepgram Flux
billgetman Feb 8, 2026
f99810d
feat(ui): add browser dictation client with AudioWorklet PCM capture
billgetman Feb 8, 2026
98bebc0
feat(ui): integrate dictation into chat view with keyboard shortcut
billgetman Feb 8, 2026
a2887bc
docs: add dictation design plans and changelog entry
billgetman Feb 8, 2026
9cf566d
fix: resolve type errors in dictation and app-render for upstream com…
billgetman Feb 8, 2026
9694893
fix: cap audio buffer and add connection timeout; strict dictation check
billgetman Feb 9, 2026
34956aa
fix: always show mic button, disable when dictation unavailable
billgetman Feb 9, 2026
aa99f37
feat(dictation): add speech-to-text dictation via Deepgram
billgetman Feb 9, 2026
40712d0
feat: rebrand UI to DeepClaw with Deepgram design system
billgetman Feb 9, 2026
7f604e2
feat(ui): add copy button to logs page
billgetman Feb 9, 2026
ec8c0dc
chore: configure docker-compose for DeepClaw deployment
billgetman Feb 9, 2026
2e9012e
feat(ui): add mobile optimization — PWA, bottom tab bar, keyboard han…
billgetman Feb 10, 2026
3079dfe
feat: add model catalog with provider discovery and agent management UI
billgetman Feb 10, 2026
98b1dc7
Merge pull request #1 from deepgram/feature/model-selection
billgetman Feb 10, 2026
ab10b43
feat: add Deepgram voice agent with SSE streaming proxy and channel U…
billgetman Feb 10, 2026
ce87d14
Merge pull request #2 from deepgram/feature/model-selection
billgetman Feb 10, 2026
433f33a
feat: improve Perplexity web search with search context, recency, dom…
Feb 10, 2026
5a41c74
fix: complete Perplexity cache key and normalize domain filters
Feb 11, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .nvmrc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
22
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ Docs: https://docs.openclaw.ai

### Added

- Web UI: add speech-to-text dictation to chat compose via Deepgram Flux — mic button, keyboard shortcut (Cmd/Ctrl+Shift+D), recording indicators, and end-of-thought detection.
- Gateway: add `agents.create`, `agents.update`, `agents.delete` RPC methods for web UI agent management. (#11045) Thanks @advaitpaliwal.
- Gateway: add node command allowlists (default-deny unknown node commands; configurable via `gateway.nodes.allowCommands` / `gateway.nodes.denyCommands`). (#11755) Thanks @mbelinky.
- Plugins: add `device-pair` (Telegram `/pair` flow) and `phone-control` (iOS/Android node controls). (#11755) Thanks @mbelinky.
Expand Down
11 changes: 11 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,19 +1,30 @@
services:
openclaw-gateway:
image: ${OPENCLAW_IMAGE:-openclaw:local}
user: root
environment:
HOME: /home/node
TERM: xterm-256color
OPENCLAW_GATEWAY_TOKEN: ${OPENCLAW_GATEWAY_TOKEN}
CLAUDE_AI_SESSION_KEY: ${CLAUDE_AI_SESSION_KEY}
CLAUDE_WEB_SESSION_KEY: ${CLAUDE_WEB_SESSION_KEY}
CLAUDE_WEB_COOKIE: ${CLAUDE_WEB_COOKIE}
DEEPGRAM_API_KEY: ${DEEPGRAM_API_KEY}
OPENAI_API_KEY: ${OPENAI_API_KEY:-}
TWILIO_ACCOUNT_SID: ${TWILIO_ACCOUNT_SID}
TWILIO_AUTH_TOKEN: ${TWILIO_AUTH_TOKEN}
PUBLIC_URL: ${PUBLIC_URL:-}
volumes:
- ${OPENCLAW_CONFIG_DIR}:/home/node/.openclaw
- ${OPENCLAW_WORKSPACE_DIR}:/home/node/.openclaw/workspace
- /Users/billgetman/openclaw/sandboxes:/Users/billgetman/openclaw/sandboxes
- /Users/billgetman/.docker/run/docker.sock:/var/run/docker.sock
- ${OPENCLAW_CONFIG_DIR}/docker-cli:/usr/local/bin/docker:ro
- ./dist:/app/dist:ro
ports:
- "${OPENCLAW_GATEWAY_PORT:-18789}:18789"
- "${OPENCLAW_BRIDGE_PORT:-18790}:18790"
- "${OPENCLAW_VOICE_PORT:-8000}:3334"
init: true
restart: unless-stopped
command:
Expand Down
214 changes: 214 additions & 0 deletions docs/plans/2026-02-07-dictation-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,214 @@
# Voice Dictation in Web Chat

**Date:** 2026-02-07
**Status:** Design complete, ready for implementation

## Overview

Add voice dictation to the web chat compose box using Deepgram's Flux model for real-time speech-to-text with intelligent end-of-thought detection.

## User Flow

1. User clicks mic button (or presses `Cmd/Ctrl+Shift+D`)
2. Browser requests mic permission if not already granted
3. Mic button turns red/pulsing to indicate active recording
4. Audio streams to gateway, which proxies to Deepgram Flux
5. Live transcript appears in textarea as user speaks (interim results updating in real-time)
6. Flux detects end-of-thought → recording auto-stops (or user manually stops)
7. Final transcript remains in textarea; user can edit and press Enter to send

## Technical Architecture

### Browser Side

**New module:** `ui/src/ui/dictation.ts`

- Mic access via `navigator.mediaDevices.getUserMedia({ audio: true })`
- Audio capture using `AudioWorklet` (required; no ScriptProcessorNode fallback)
- Output format: linear16 PCM, 16kHz sample rate
- WebSocket connection to gateway dictation endpoint
- Receives transcript events, updates textarea draft state

### Gateway Side

**New WebSocket endpoint:** `/dictation/stream`

- Authenticates using existing gateway auth mechanism
- Opens upstream WebSocket to Deepgram:
```
wss://api.deepgram.com/v2/listen?model=flux-general-en&encoding=linear16&sample_rate=16000&interim_results=true&punctuate=true&smart_format=true
```
- Proxies audio chunks: browser → Deepgram
- Proxies transcript events: Deepgram → browser
- Uses existing `DEEPGRAM_API_KEY` from provider config

### Message Flow

```
Browser mic
AudioWorklet (PCM chunks, ~80ms)
Gateway WebSocket (/dictation/stream)
Deepgram WebSocket (/v2/listen, Flux)
Transcript events (Results, UtteranceEnd)
Gateway → Browser
Textarea updates
```

### Deepgram Flux Configuration

| Parameter | Value | Purpose |
| ----------------- | ----------------- | ----------------------------------------------- |
| `model` | `flux-general-en` | Conversational model with end-of-turn detection |
| `encoding` | `linear16` | PCM audio format |
| `sample_rate` | `16000` | 16kHz sample rate |
| `interim_results` | `true` | Stream partial transcripts |
| `punctuate` | `true` | Auto-punctuation |
| `smart_format` | `true` | Formatting for numbers, dates, etc. |

Flux provides ~260ms end-of-turn detection latency.

## UI Components

### Mic Button

**Location:** `chat-compose__actions` div, before "New session" button

**States:**

- `idle` - Gray mic icon, clickable
- `recording` - Red pulsing mic icon, clickable to stop
- `disabled` - Grayed out (no Deepgram API key configured)

**Tooltip:**

- When enabled: "Dictate (⌘⇧D)" / "Dictate (Ctrl+Shift+D)"
- When disabled: "Configure Deepgram API key to enable dictation"

### Recording Indicator

- Mic icon pulses with CSS animation (`@keyframes pulse`)
- Visual state clearly indicates active recording

### Textarea Behavior

- Interim text may appear in lighter color or italic (distinguishes unconfirmed words)
- Final text renders in normal style as Deepgram confirms
- Existing draft text preserved; dictation appends at cursor position
- User can type while recording (both inputs work simultaneously)

### Permission Modal

Triggered when `getUserMedia()` fails with `NotAllowedError`.

**Content:**

- Header: "Microphone Access Required"
- Browser-specific instructions for Chrome, Safari, Firefox, Edge
- Buttons: "Try Again", "Cancel"

## Keyboard Shortcut

- **Shortcut:** `Cmd+Shift+D` (macOS) / `Ctrl+Shift+D` (Windows/Linux)
- **Behavior:** Toggles dictation on/off
- **Discoverability:** Shown in mic button tooltip

## Feature Detection

### Gateway Hello Response

Add to gateway hello payload:

```typescript
features: {
dictation: boolean; // true if DEEPGRAM_API_KEY is configured
}
```

### Browser Requirements

- `navigator.mediaDevices.getUserMedia` support
- `AudioWorklet` support (Chrome 66+, Firefox 76+, Safari 14.1+)

If AudioWorklet unavailable, mic button is hidden (no fallback for v1).

## Error Handling

### Connection Failures

| Scenario | Behavior |
| ----------------------- | ----------------------------------------------------------- |
| Gateway WebSocket fails | Inline error: "Dictation unavailable. Check connection." |
| Deepgram upstream fails | Error event to browser: "Transcription service unavailable" |
| Transient failure | Auto-retry once, then show error |

### During Recording

| Scenario | Behavior |
| ------------------------ | -------------------------------------------------------- |
| WebSocket disconnects | Stop recording, keep transcript, show brief error |
| User navigates away | Stop recording gracefully (send CloseStream) |
| No audio for 10+ seconds | Subtle hint: "No audio detected. Check your microphone." |

### Concurrent Usage

- Only one dictation session at a time
- Click mic while recording = stop
- Typing while recording = both work (no conflict)

## Configuration

### Required

- `DEEPGRAM_API_KEY` environment variable (existing)

### No New Config

- Dictation enabled automatically if Deepgram key is present
- No separate toggle to enable/disable dictation feature
- Uses system default microphone (no mic picker)

## Scope

**In scope (v1):**

- Web UI chat only
- Single language (English via `flux-general-en`)
- System default microphone

**Out of scope (future):**

- Native apps (iOS, macOS, Android)
- TUI
- Language selection
- Microphone picker
- Waveform visualization

## Files to Create/Modify

### New Files

- `ui/src/ui/dictation.ts` - Dictation state machine, mic handling, WebSocket client
- `ui/src/ui/audio-worklet.ts` - AudioWorklet processor for PCM capture
- `ui/src/ui/components/mic-permission-modal.ts` - Permission help modal
- `ui/src/styles/dictation.css` - Mic button states, pulse animation
- `src/gateway/server-dictation.ts` - Gateway WebSocket proxy to Deepgram

### Modified Files

- `ui/src/ui/views/chat.ts` - Add mic button to compose area
- `ui/src/ui/app-chat.ts` - Integrate dictation state
- `src/gateway/server.ts` - Register dictation WebSocket endpoint
- `src/gateway/protocol/schema/hello.ts` - Add `features.dictation` field

## Testing

- Unit tests for dictation state machine
- Integration test for gateway proxy (mock Deepgram)
- Manual browser testing for mic permission flows
- E2E test with real Deepgram (live test, requires key)
Loading
Loading