You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[x] Unreal Speech engine with two-step URI non-streaming, direct streaming
17
+
-[x] Resemble engine with base64 JSON non-streaming, direct streaming
18
+
19
+
## New Engines to Add
20
+
21
+
### Lower Priority (Open-Source / Niche)
22
+
23
+
| Engine | Models | Key Features | Notes |
24
+
|--------|--------|-------------|-------|
25
+
|**fal**|`f5-tts`, `kokoro`, `dia-tts`, `orpheus-tts`, `index-tts-2`| Voice cloning, open-source | No streaming, many sub-models |
26
+
|**Google Gemini TTS**|`gemini-2.5-flash-preview-tts`, `gemini-2.5-pro-preview-tts`| Pseudo-streaming, 23 languages | Different from existing Google Cloud TTS |
27
+
28
+
## Cross-Cutting Features
29
+
30
+
### Audio Tags (Cross-Provider Abstraction)
31
+
32
+
Unified `[tag]` syntax mapped to provider-specific representations:
33
+
-**ElevenLabs v3** — native passthrough (done)
34
+
-**Cartesia sonic-3** — emotions to `<emotion value="..."/>` SSML (done)
35
+
-**OpenAI gpt-4o-mini-tts** — tags to natural language `instructions`
|`resemble`|`ResembleTTSClient`| Both | Resemble AI | None (uses fetch API) |
60
69
61
70
**Factory Name**: Use with `createTTSClient('factory-name', credentials)`
62
71
**Class Name**: Use with direct import `import { ClassName } from 'js-tts-wrapper'`
@@ -90,6 +99,15 @@ A JavaScript/TypeScript library that provides a unified API for working with mul
90
99
|**SherpaOnnx**| ✅ | Estimated | ❌ | Low |
91
100
|**SherpaOnnx-WASM**| ✅ | Estimated | ❌ | Low |
92
101
|**SAPI**| ✅ | Estimated | ❌ | Low |
102
+
|**Cartesia**| ✅ | Estimated | ❌ | Low |
103
+
|**Deepgram**| ✅ | Estimated | ❌ | Low |
104
+
|**Hume**| ✅ | Estimated | ❌ | Low |
105
+
|**xAI**| ✅ | Estimated | ❌ | Low |
106
+
|**Fish Audio**| ✅ | Estimated | ❌ | Low |
107
+
|**Mistral**| ✅ | Estimated | ❌ | Low |
108
+
|**Murf**| ✅ | Estimated | ❌ | Low |
109
+
|**Unreal Speech**| ✅ | Estimated | ❌ | Low |
110
+
|**Resemble**| ✅ | Estimated | ❌ | Low |
93
111
94
112
**Character-Level Timing**: Only ElevenLabs provides precise character-level timing data via the `/with-timestamps` endpoint, enabling the most accurate word highlighting and speech synchronization.
95
113
@@ -253,7 +271,7 @@ async function runExample() {
253
271
runExample().catch(console.error);
254
272
```
255
273
256
-
The factory supports all engines: `'azure'`, `'google'`, `'polly'`, `'elevenlabs'`, `'openai'`, `'modelslab'`, `'playht'`, `'watson'`, `'witai'`, `'sherpaonnx'`, `'sherpaonnx-wasm'`, `'espeak'`, `'espeak-wasm'`, `'sapi'`, etc.
274
+
The factory supports all engines: `'azure'`, `'google'`, `'polly'`, `'elevenlabs'`, `'openai'`, `'modelslab'`, `'playht'`, `'watson'`, `'witai'`, `'sherpaonnx'`, `'sherpaonnx-wasm'`, `'espeak'`, `'espeak-wasm'`, `'sapi'`, `'cartesia'`, `'deepgram'`, `'hume'`, `'xai'`, `'fishaudio'`, `'mistral'`, `'murf'`, `'unrealspeech'`, `'resemble'`, etc.
257
275
258
276
## Core Functionality
259
277
@@ -471,6 +489,15 @@ The following engines **automatically strip SSML tags** and convert to plain tex
471
489
-**PlayHT** - SSML tags are removed, plain text is synthesized
472
490
-**ModelsLab** - SSML tags are removed, plain text is synthesized
473
491
-**SherpaOnnx/SherpaOnnx-WASM** - SSML tags are removed, plain text is synthesized
492
+
-**Cartesia** - SSML tags removed; audio tags (`[laugh]`, `[sigh]`, etc.) mapped to `<emotion>` for sonic-3, stripped for others
493
+
-**Deepgram** - SSML tags are removed, plain text is synthesized
494
+
-**Hume** - SSML tags are removed, plain text is synthesized
495
+
-**xAI** - SSML tags are removed; audio tags passed natively for grok-tts
0 commit comments