Skip to content

Commit 138be31

Browse files
committed
feat: add kokoro plugin
1 parent aea7b65 commit 138be31

11 files changed

Lines changed: 1445 additions & 30 deletions

File tree

examples/tts_kokoro/README.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# Stream × Kokoro — TTS Bot
2+
3+
Speak into a Stream video call… from Python!
4+
5+
This tiny example spins up a text-to-speech bot that joins a call and greets participants using the open-weight [Kokoro](https://github.com/hexgrad/kokoro) model.
6+
7+
---
8+
9+
## Quick start
10+
11+
```bash
12+
# clone and move into the repo (if not already there)
13+
cd examples/tts_kokoro
14+
15+
# install deps (pick one)
16+
pip install -e . # classic
17+
uv venv .venv && source .venv/bin/activate && uv sync # fast ⚡️
18+
19+
# copy env template and fill in Stream keys
20+
cp ../stt_deepgram_transcription/env.example .env
21+
$EDITOR .env # STREAM_*
22+
23+
# make sure espeak-ng is installed (macOS example)
24+
brew install espeak-ng
25+
26+
# The example will auto-bootstrap pip if it's missing; this command is a
27+
# manual fallback in case you want to do it yourself upfront.
28+
python -m ensurepip --upgrade # optional
29+
30+
# run it
31+
python main.py # or: uv -m python main.py
32+
```
33+
34+
You'll see the bot join, say a greeting, then wait. Add extra `await tts.send("…")` calls in `main.py` to make it speak more.
35+
36+
---
37+
38+
## How it works (60 sec)
39+
40+
1. Creates two temporary Stream users (human + `tts-bot`).
41+
2. Opens a browser URL so you can join the call instantly.
42+
3. Builds an `AudioStreamTrack` at **24 kHz** and connects it to Kokoro.
43+
4. Joins the call and sends a greeting via `tts.send()`.
44+
5. `await connection.wait()` keeps the bot alive until **Ctrl-C**.
45+
6. On shutdown the script deletes the temporary users.
46+
47+
Under 120 lines of code 😀
48+
49+
---
50+
51+
Need help? → [Stream Video docs](https://getstream.io/video/docs/) · [Kokoro README](https://github.com/hexgrad/kokoro/blob/main/README.md)

examples/tts_kokoro/main.py

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
#!/usr/bin/env python3
2+
"""
3+
Example: Text-to-Speech bot with Kokoro
4+
5+
This minimal example shows how to:
6+
1. Spin up a Stream video call
7+
2. Attach a Kokoro TTS bot that can speak into the call
8+
9+
Run it, join the call in your browser, and hear the bot greet you 🗣️
10+
11+
Usage::
12+
python main.py
13+
14+
The script looks for the following env vars (see `env.example`):
15+
STREAM_API_KEY / STREAM_API_SECRET
16+
17+
Kokoro runs fully offline – no extra API key required, but you **must** have
18+
`espeak-ng` installed and available on the PATH for fallback phoneme
19+
generation. On macOS: `brew install espeak-ng`.
20+
"""
21+
22+
from __future__ import annotations
23+
24+
import asyncio
25+
import logging
26+
import os
27+
from uuid import uuid4
28+
import importlib, sys
29+
30+
from dotenv import load_dotenv
31+
32+
from examples.utils import create_user, open_browser
33+
from getstream.stream import Stream
34+
from getstream.video import rtc
35+
from getstream.video.rtc import audio_track
36+
from getstream_kokoro import KokoroTTS
37+
38+
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
39+
40+
os.environ["KOKORO_NO_AUTO_INSTALL"] = "1" # Disable auto-install of kokoro dependencies
41+
42+
# ---------------------------------------------------------------------------
43+
# Ensure `pip` is present – uv-created virtual-envs omit it for speed and
44+
# Kokoro relies on `python -m pip` for optional installs (voices, extras).
45+
# Run this *before* we import Kokoro.
46+
# ---------------------------------------------------------------------------
47+
try:
48+
importlib.import_module("pip")
49+
except ModuleNotFoundError: # pragma: no cover – only triggers in uv venvs
50+
import ensurepip, subprocess # noqa: WPS433
51+
52+
print("Boot-strapping pip (uv venv detected – pip missing)…", file=sys.stderr)
53+
ensurepip.bootstrap()
54+
subprocess.check_call([sys.executable, "-m", "pip", "install", "--upgrade", "pip"])
55+
56+
async def main() -> None:
57+
"""Create a video call and let a Kokoro TTS bot greet participants."""
58+
59+
load_dotenv(os.path.join(os.path.dirname(__file__), "..", ".env"))
60+
61+
client: Stream = Stream.from_env()
62+
63+
human_id = f"user-{uuid4()}"
64+
bot_id = f"tts-bot-{uuid4()}"
65+
66+
create_user(client, human_id, "Human")
67+
create_user(client, bot_id, "TTS Bot")
68+
69+
logging.info("Created users: %s (human) / %s (bot)", human_id, bot_id)
70+
71+
token = client.create_token(human_id, expiration=3600)
72+
73+
call_id = str(uuid4())
74+
call = client.video.call("default", call_id)
75+
call.get_or_create(data={"created_by_id": bot_id})
76+
77+
logging.info("📞 Call ready: %s", call_id)
78+
79+
open_browser(client.api_key, token, call_id)
80+
81+
# Kokoro produces 24 kHz mono 16-bit PCM
82+
track = audio_track.AudioStreamTrack(framerate=24_000)
83+
84+
# Build TTS pipeline (defaults to American English / af_heart voice)
85+
tts = KokoroTTS()
86+
tts.set_output_track(track)
87+
88+
greeting = (
89+
"Hello there! I'm a Kokoro text-to-speech bot speaking inside this call. "
90+
"As this is a minimal example, I'll stop speaking now."
91+
)
92+
93+
try:
94+
async with await rtc.join(call, bot_id) as connection:
95+
await connection.add_tracks(audio=track)
96+
logging.info("🤖 Bot joined call: %s", call_id)
97+
98+
await asyncio.sleep(1)
99+
# Send greeting once the track is live
100+
await tts.send(greeting)
101+
logging.info("Sent greeting via TTS")
102+
103+
logging.info("🎧 Bot is idle – press Ctrl+C to stop")
104+
await connection.wait()
105+
106+
except (asyncio.CancelledError):
107+
logging.info("Stopping TTS bot…")
108+
finally:
109+
client.delete_users([human_id, bot_id])
110+
logging.info("Cleanup completed")
111+
112+
113+
if __name__ == "__main__":
114+
asyncio.run(main())

examples/tts_kokoro/pyproject.toml

Lines changed: 10 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,26 @@
11
[project]
2-
name = "getstream-tts-kokoro-example"
2+
name = "tts-kokoro-example"
33
version = "0.1.0"
4-
description = "Example project showing how to use Kokoro TTS with GetStream"
4+
description = "Stream Video + Kokoro TTS demo."
55
readme = "README.md"
66
requires-python = ">=3.9"
7-
license = {text = "MIT"}
7+
license = { text = "MIT" }
88

99
dependencies = [
10-
"getstream[webrtc]",
1110
"python-dotenv>=1.0.0",
12-
# Add kokoro dependencies as needed
11+
"kokoro>=0.9.4",
12+
"misaki[en]>=0.1.0",
13+
"soundfile>=0.13.0",
14+
"aiortc>=1.10.1",
15+
"numpy>=2.0.0"
1316
]
1417

1518
[project.optional-dependencies]
16-
dev = [
17-
"pytest>=7.0.0",
18-
"pytest-asyncio>=0.21.0",
19-
]
19+
dev = ["pytest>=7.0", "pytest-asyncio>=0.21"]
2020

2121
[build-system]
2222
requires = ["setuptools>=61.0", "wheel"]
2323
build-backend = "setuptools.build_meta"
2424

2525
[tool.uv.sources]
26-
getstream = { workspace = true }
27-
getstream-plugins-stt-deepgram = { workspace = true }
28-
getstream-plugins-tts-elevenlabs = { workspace = true }
29-
getstream-plugins-vad-silero = { workspace = true }
26+
getstream = { workspace = true }

getstream/plugins/deepgram/env-var-api-key

Lines changed: 0 additions & 16 deletions
This file was deleted.

getstream/plugins/deepgram/src/getstream_deepgram/stt.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ class DeepgramSTT(STT):
4242
def __init__(
4343
self,
4444
api_key: Optional[str] = None,
45-
options: Optional[LiveOptions] = None,
45+
options: Optional[LiveOptions] = None, # type: ignore
4646
sample_rate: int = 48000,
4747
language: str = "en-US",
4848
keep_alive_interval: float = 3.0,

getstream/plugins/kokoro/README.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# GetStream Kokoro Plugin
2+
3+
This package integrates the open-weight [Kokoro-82M TTS model](https://github.com/hexgrad/kokoro) with the GetStream audio/video SDK.
4+
5+
It provides a drop-in `KokoroTTS` class that implements the common `getstream_common.tts.TTS` interface, allowing you to stream PCM audio generated by Kokoro directly into a WebRTC `AudioStreamTrack`.
6+
7+
```py
8+
from getstream_kokoro import KokoroTTS
9+
from getstream.video.rtc.audio_track import AudioStreamTrack
10+
11+
track = AudioStreamTrack(framerate=24_000)
12+
13+
tts = KokoroTTS(lang_code="a", voice="af_heart")
14+
tts.set_output_track(track)
15+
16+
await tts.send("Hello from Kokoro!")
17+
```
18+
19+
## Installation
20+
21+
```bash
22+
pip install getstream-plugins-kokoro
23+
```
24+
25+
This will pull in the required `kokoro`, `numpy` and `getstream[webrtc]` dependencies. You also need `espeak-ng` **at runtime** for pronunciation fallback. On macOS you can install it with Homebrew:
26+
27+
```bash
28+
brew install espeak-ng
29+
```
30+
31+
## Configuration options
32+
33+
| Parameter | Default | Description |
34+
|-----------|---------|-------------|
35+
| `lang_code` | `"a"` | Language group passed to `KPipeline` (`"a"` = American English, etc.) |
36+
| `voice` | `"af_heart"` | Kokoro voice preset. See the [model card](https://huggingface.co/NeuML/kokoro-int8-onnx#speaker-reference) for available options. |
37+
| `speed` | `1.0` | Playback speed multiplier. |
38+
| `sample_rate` | `24000` | Output sample-rate (fixed by Kokoro). **The attached `AudioStreamTrack` must use the same value.** |
39+
40+
## Development
41+
42+
Run the unit-tests with:
43+
44+
```bash
45+
pytest -q getstream/plugins/kokoro/tests
46+
```
47+
48+
No network calls are made – the Kokoro SDK is fully mocked.
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
[build-system]
2+
requires = ["hatchling"]
3+
build-backend = "hatchling.build"
4+
5+
[project]
6+
name = "getstream-plugins-kokoro"
7+
version = "0.1.0"
8+
description = "Kokoro TTS plugin for GetStream"
9+
readme = "README.md"
10+
requires-python = ">=3.10"
11+
license = "MIT"
12+
dependencies = [
13+
"getstream[webrtc]",
14+
"kokoro>=0.9.4",
15+
"soundfile>=0.13.0",
16+
]
17+
18+
[project.optional-dependencies]
19+
test = [
20+
"pytest>=7.0.0",
21+
"pytest-asyncio>=0.18.0",
22+
]
23+
24+
[tool.hatch.build.targets.wheel]
25+
packages = ["src/getstream_kokoro"]
26+
27+
[tool.uv.sources]
28+
getstream = { workspace = true }
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
from .tts import KokoroTTS
2+
3+
__all__ = ["KokoroTTS"]

0 commit comments

Comments
 (0)