You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add voice agent conciseness controls via SOUL.md and maxTokens
Voice responses were too long for TTS playback. This adds a two-layer
approach: a stricter SOUL.md with hard sentence limits (1-6 sentences)
and a dedicated voice model key with maxTokens capped at 512 tokens,
while leaving other channels unaffected at 8192.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@@ -497,6 +510,10 @@ Add the following to `~/.openclaw/openclaw.json`:
497
510
498
511
This routes all voice-assistant messages to the `voice-agent` (with the conversational SOUL.md), while Telegram and other channels continue using the default agent with normal rich-text responses.
499
512
513
+
**Why a separate model key?** OpenClaw's `maxTokens` is set per-model, not per-agent. By creating a dedicated model key (`anthropic/claude-sonnet-4-5-voice`), the voice agent gets a hard 512-token ceiling while other channels keep their default limit (8192). Both keys route to the same underlying Anthropic model — the key is just OpenClaw's internal routing identifier. Combined with the SOUL.md conciseness instructions, this ensures voice responses stay short and natural for TTS.
514
+
515
+
> **Tip:** If 512 tokens feels too restrictive (responses getting cut off), bump it to `768` or `1024`. For most spoken responses, 512 tokens (~3–5 sentences) is the sweet spot.
516
+
500
517
### 9c. Restart and Test
501
518
502
519
```bash
@@ -507,11 +524,16 @@ Say **"Hey Jarvis, tell me about the weather"** — the response should sound na
| Telegram | Default agent |`claude-sonnet-4-5`| 8192 | Normal rich text | N/A |
531
+
| Telegram → Voice broadcast | Default agent |`claude-sonnet-4-5`| 8192 | Normal rich text | Yes |
532
+
533
+
Voice response conciseness is controlled by two independent layers:
534
+
535
+
1.**SOUL.md** (soft control) — instructs the LLM to keep responses to 1–6 sentences. This is the primary lever.
536
+
2.**maxTokens** (hard ceiling) — caps the voice model at 512 tokens, preventing runaway generation even if the LLM ignores the system prompt.
515
537
516
538
The TTS sanitizer (`tts-sanitize.ts`) acts as a safety net on **all** text reaching the speaker, regardless of which agent generated it. Even if the voice agent's SOUL.md instructions are followed perfectly, the sanitizer ensures no markdown artifacts slip through.
Copy file name to clipboardExpand all lines: docs/channels/voice-assistant.md
+42Lines changed: 42 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -96,6 +96,48 @@ When `broadcastAllChannels: true`, messages from ANY channel are spoken via TTS:
96
96
3. WhatsApp receives text response.
97
97
4. Voice assistant receives TTS playback.
98
98
99
+
## Controlling response length
100
+
101
+
Voice responses are spoken aloud, so conciseness matters. There are two layers to control this:
102
+
103
+
### 1. SOUL.md (primary — soft control)
104
+
105
+
Bind a dedicated voice agent with its own workspace and SOUL.md that instructs the LLM to keep responses brief. See [RASPBERRY-PI-SETUP.md](/RASPBERRY-PI-SETUP.md) for a full template. Key rules to include:
106
+
107
+
- Hard sentence limits (1–2 for simple questions, 5–6 max for complex topics)
108
+
- No markdown formatting
109
+
- Natural speech patterns
110
+
111
+
### 2. maxTokens (secondary — hard ceiling)
112
+
113
+
Create a dedicated model key for the voice agent with a low `maxTokens` value. This prevents runaway generation even if the LLM ignores the system prompt:
The dedicated model key (`-voice` suffix) inherits the same underlying model but gets its own `maxTokens`. Other channels keep their default limit. Start with 512 tokens and adjust up if responses feel cut off.
Voice responses are spoken aloud via TTS, so brevity is critical. Two mechanisms work together:
37
+
38
+
1.**SOUL.md** — Bind a dedicated `voice-agent` with a workspace containing a SOUL.md that enforces sentence limits (1–6 sentences max). This is the primary control.
39
+
2.**maxTokens** — Create a voice-specific model key (e.g. `anthropic/claude-sonnet-4-5-voice`) with `params.maxTokens: 512` in `agents.defaults.models`. This acts as a hard ceiling.
40
+
41
+
See [RASPBERRY-PI-SETUP.md](/RASPBERRY-PI-SETUP.md) for the full configuration example and SOUL.md template.
0 commit comments