Skip to content

feat(VoiceServer): local TTS support, voice personalities, and automatic ElevenLabs fallback#1101

Open
salaheldinaz wants to merge 4 commits intodanielmiessler:mainfrom
salaheldinaz:main
Open

feat(VoiceServer): local TTS support, voice personalities, and automatic ElevenLabs fallback#1101
salaheldinaz wants to merge 4 commits intodanielmiessler:mainfrom
salaheldinaz:main

Conversation

@salaheldinaz
Copy link
Copy Markdown

Summary

  • Fix .env path bug — server loaded ~/.env but PAI stores the key at ~/.config/PAI/.env, causing elevenlabs_api_key_configured to always report false and silently skipping all TTS audio
  • Add local TTS engine — uses macOS built-in say command (zero dependencies, no API key) via new playLocalSpeech()
  • Automatic ElevenLabs fallback — when ElevenLabs fails for any reason (402 unpaid voice, network error, missing key), server silently retries with local TTS so voice notifications always play
  • Voice personality catalogue — curated list of 22 realistic English macOS voices with accent, gender, and personality descriptions grouped into natural and classic categories
  • GET /voices/local endpoint — lists voices installed on the system with the active voice marked and instructions for switching
  • Improved /health response — now exposes tts_provider, local_voice, local_tts_available; renames api_key_configuredelevenlabs_api_key_configured

Configuration

No breaking changes. Existing installs behave identically. To customise local TTS add to ~/.claude/settings.json:

"voiceServer": {
  "tts_provider": "local",
  "local_voice": "Daniel"
}

tts_provider options: "elevenlabs" (default — ElevenLabs with local fallback) or "local" (always local, no API call).

Browse available voices on your system:

curl http://localhost:8888/voices/local

Test plan

  • /health returns local_tts_available: true and correct tts_provider
  • GET /voices/local returns catalogue with active voice marked
  • ElevenLabs 402 / missing key → local TTS plays, response still 200
  • tts_provider: "local" → local TTS used directly, no ElevenLabs call
  • tts_provider: "elevenlabs" with valid paid key → ElevenLabs used, unchanged behaviour
  • .env loaded correctly from ~/.config/PAI/.env

🤖 Generated with Claude Code

salaheldinaz and others added 4 commits April 25, 2026 18:59
…lback

- Fix .env loading path: try ~/.config/PAI/.env first (PAI standard location),
  fall back to ~/.env — resolves api_key_configured always showing false
- Add playLocalSpeech() using macOS say command (no API key required)
- Add tts_provider config: set voiceServer.tts_provider = "local" in settings.json
  to use local TTS exclusively; defaults to "elevenlabs"
- Add localVoice config: voiceServer.local_voice sets the macOS voice (default: Samantha)
- Automatic fallback: when ElevenLabs fails (402, network error, etc.), server
  silently retries with local TTS so notifications always play
- Update /health endpoint: exposes tts_provider, local_voice, local_tts_available,
  renames api_key_configured → elevenlabs_api_key_configured for clarity
- Update startup log to show active TTS mode

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…local endpoint

- Add curated LOCAL_VOICE_CATALOGUE with 22 realistic English voices grouped
  by category (natural/classic) with accent, gender, and personality descriptions
- Add getInstalledLocalVoices() that cross-references the catalogue against
  voices actually installed on the system via `say -v ?`
- Add GET /voices/local endpoint: returns full catalogue with active voice
  marked and instructions for switching (set voiceServer.local_voice in
  settings.json)
- Surface /voices/local in the root response for discoverability

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… fallback

Merges feature/local-tts-fallback into main.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds kokoro-fastapi as a third TTS provider alongside ElevenLabs and
local macOS say. Configure via settings.json voiceServer.tts_provider,
kokoro_url, and kokoro_voice. Falls back to local TTS on connection
failure.

Also extracts shared preprocessForTTS() helper (removes pronunciation
log duplication between providers) and fixes a temp file leak in
playAudio's error handler.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant