Skip to content

Commit 0d510ae

Browse files
committed
fix moonshine example
1 parent ce05355 commit 0d510ae

4 files changed

Lines changed: 113 additions & 302 deletions

File tree

Lines changed: 48 additions & 185 deletions
Original file line numberDiff line numberDiff line change
@@ -1,209 +1,72 @@
1-
# Moonshine STT Transcription Example
1+
# Stream × Moonshine + Silero — Live Transcription
22

3-
This example demonstrates real-time call transcription using the Moonshine Speech-to-Text plugin with GetStream Video SDK.
3+
This example spins up a bot that joins a Stream Video call, detects speech
4+
with **Silero VAD**, transcribes it with the **Moonshine** model, and prints
5+
final transcripts to the terminal.
46

5-
## Features
6-
7-
- **Real-time Transcription**: Process audio from video calls using Moonshine STT
8-
- **Voice Activity Detection**: Integrated Silero VAD to filter speech from silence
9-
- **Efficient Processing**: Only transcribe actual speech, reducing computational overhead
10-
- **Performance Monitoring**: Track transcription speed, accuracy, and resource usage
11-
- **Model Selection**: Choose between `moonshine/tiny` (fast) and `moonshine/base` (accurate)
12-
- **Configurable Processing**: Adjust VAD sensitivity and STT parameters
13-
14-
## Prerequisites
15-
16-
1. **GetStream Account**: Get your API key from [GetStream Dashboard](https://dashboard.getstream.io/)
17-
2. **Moonshine Library**: Install the Moonshine STT library
18-
3. **Python 3.9+**: Required for the GetStream SDK
19-
20-
## Installation
21-
22-
1. **Install Moonshine STT Library**:
23-
```bash
24-
pip install useful-moonshine@git+https://github.com/usefulsensors/moonshine.git
25-
```
26-
27-
2. **Install Example Dependencies**:
28-
```bash
29-
# From the example directory
30-
uv sync
31-
```
32-
33-
3. **Configure Environment**:
34-
```bash
35-
cp env.example .env
36-
# Edit .env with your GetStream API key
37-
```
38-
39-
## Configuration
40-
41-
Edit the `.env` file with your settings:
42-
43-
```env
44-
# Required: Your GetStream API key
45-
STREAM_API_KEY=your_stream_api_key_here
46-
47-
# Optional: Moonshine model selection (default: moonshine/base)
48-
MOONSHINE_MODEL=moonshine/base # or moonshine/tiny
49-
50-
# Demo Configuration
51-
DEMO_DURATION_SECONDS=60 # Demo runtime (seconds)
7+
Pipeline:
8+
```
9+
WebRTC audio ▶︎ Silero VAD ▶︎ Moonshine STT ▶︎ print transcript
5210
```
5311

54-
### Model Comparison
55-
56-
| Model | Size | Speed | Accuracy | Use Case |
57-
|-------|------|-------|----------|----------|
58-
| `moonshine/tiny` | ~190MB | 5-10x real-time | Good | Real-time applications, resource-constrained |
59-
| `moonshine/base` | ~400MB | 3-5x real-time | Better | **Default** - Higher accuracy requirements |
60-
61-
## Usage
62-
63-
### Basic Demo
12+
---
6413

65-
Run the transcription demo:
14+
## Quick start
6615

6716
```bash
68-
python main.py
69-
```
17+
cd examples/stt_moonshine_transcription
7018

71-
This will:
72-
1. Initialize Moonshine STT with your chosen model
73-
2. Set up Voice Activity Detection (if enabled)
74-
3. Display configuration and wait for audio input
75-
4. Show periodic statistics during runtime
19+
# create & activate env (fast, no pip)
20+
uv venv .venv && source .venv/bin/activate
7621

77-
### Integration with Video Calls
22+
# install everything declared in this folder's pyproject.toml
23+
uv sync
7824

79-
To integrate with actual video calls, modify the `run_transcription_demo` method:
80-
81-
```python
82-
# Example integration (pseudo-code)
83-
async def process_call_audio(call_id: str):
84-
# Initialize components
85-
stt = Moonshine()
86-
vad = Silero(sample_rate=16000, speech_pad_ms=300, min_speech_ms=250)
87-
88-
# Set up VAD -> STT pipeline
89-
@vad.on("audio")
90-
async def on_speech_detected(pcm_data, user):
91-
await stt.process_audio(pcm_data, user)
92-
93-
# Join the call
94-
call = client.video.call("default", call_id)
95-
async with await rtc.join(call, "bot-user") as connection:
96-
@connection.on("audio")
97-
async def on_audio(pcm_data, user):
98-
# Process all audio through VAD first
99-
await vad.process_audio(pcm_data, user)
25+
# copy credentials and run
26+
cp env.example .env # fill STREAM_* keys
27+
python main.py # or: uv run python main.py
10028
```
10129

102-
## Performance Characteristics
103-
104-
### Expected Performance (on modern hardware)
105-
106-
- **Real-time Factor**: 0.1-0.3x (processes 3-10x faster than real-time)
107-
- **Latency**: 100-300ms for 1-second audio chunks
108-
- **Memory Usage**: 200-400MB depending on model
109-
- **CPU Usage**: 10-30% on modern CPUs
110-
111-
### Optimization Tips
112-
113-
1. **Model Selection**:
114-
- Use `moonshine/base` for best balance of accuracy and performance (**default**)
115-
- Use `moonshine/tiny` for maximum speed on resource-constrained devices
116-
117-
2. **Chunk Duration**:
118-
- Smaller chunks (500-1000ms): Lower latency, more processing overhead
119-
- Larger chunks (1000-2000ms): Higher latency, better efficiency
120-
121-
3. **VAD Integration**:
122-
- Silero VAD automatically filters out silence
123-
- Only processes actual speech, reducing computational overhead
124-
- Configured with optimal settings: 300ms padding, 250ms minimum speech, 0.3/0.2 activation/deactivation thresholds
30+
You'll see something like:
12531

126-
## Troubleshooting
127-
128-
### Common Issues
129-
130-
1. **Moonshine Import Error**:
131-
```
132-
ImportError: No module named 'moonshine'
133-
```
134-
**Solution**: Install Moonshine library:
135-
```bash
136-
pip install useful-moonshine@git+https://github.com/usefulsensors/moonshine.git
137-
```
138-
139-
2. **CUDA/GPU Issues**:
140-
```
141-
RuntimeError: CUDA out of memory
142-
```
143-
**Solution**: Force CPU usage:
144-
```python
145-
stt = Moonshine(device="cpu")
146-
```
32+
```text
33+
🌙 Stream + Moonshine Real-time Transcription Example
34+
📞 Call ID: 4b12…
35+
✅ Bot joined call: 4b12…
36+
🎧 Listening for audio… (Press Ctrl+C to stop)
37+
🎤 Speech detected from user: My User, duration: 1.12s
38+
[14:03:27] My User: hello moonshine
39+
```
14740

148-
3. **No Transcriptions**:
149-
- Check audio input levels
150-
- Verify VAD settings (try disabling VAD)
151-
- Ensure minimum audio length requirements are met
41+
---
15242

153-
4. **Poor Performance**:
154-
- Try the `moonshine/tiny` model
155-
- Increase chunk duration
156-
- Check system resources (CPU/memory)
43+
## How it works (short version)
15744

158-
### Debug Mode
45+
`main.py` does the following:
15946

160-
Enable debug logging for detailed information:
47+
1. Creates two temporary users (`create_user`) – **human** and **moonshine-bot**.
48+
2. Generates a random `call_id`, creates the call, and opens a join URL in your browser.
49+
3. Initialises:
50+
* `Silero()` – voice-activity detector (48 kHz by default).
51+
* `Moonshine()` – STT model (base or tiny, picked in the plugin).
52+
4. Joins the call with `rtc.join()`, then:
16153

16254
```python
163-
import logging
164-
logging.basicConfig(level=logging.DEBUG)
165-
```
166-
167-
## Example Output
55+
@connection.on("audio")
56+
async def on_pcm(pcm, user):
57+
await vad.process_audio(pcm, user) # silence filtered here
16858

59+
@vad.on("audio")
60+
async def on_speech(pcm, user):
61+
await stt.process_audio(pcm, user)
16962
```
170-
🌙 Stream + Moonshine Real-time Transcription Example
171-
===================================================
172-
📞 Call ID: 12345678-1234-1234-1234-123456789abc
173-
🔑 Created token for browser user: browser-user
174-
🤖 Created token for bot user: transcription-bot
175-
📞 Call created: 12345678-1234-1234-1234-123456789abc
176-
Opening browser to: https://pronto.getstream.io/bare/join/...
177-
178-
🤖 Starting transcription bot...
179-
The bot will join the call and transcribe speech using VAD + Moonshine STT.
180-
VAD will filter out silence and only process actual speech.
181-
Join the call in your browser and speak to see transcriptions appear here!
182-
183-
🌙 Initializing Moonshine STT...
184-
🔊 Initializing Silero VAD...
185-
✅ Audio processing pipeline ready: VAD → Moonshine STT
186-
✅ Bot joined call: 12345678-1234-1234-1234-123456789abc
187-
🎧 Listening for audio... (Press Ctrl+C to stop)
188-
189-
🎤 Speech detected from user: browser-user, duration: 2.34s
190-
[14:30:25] browser-user: Hello, this is a test of the Moonshine transcription system.
191-
└─ model: moonshine/base, device: cpu, RTF: 0.10x
192-
193-
🧹 Cleanup completed
194-
```
195-
196-
## Next Steps
19763

198-
1. **Integrate with Real Calls**: Modify the example to process actual call audio
199-
2. **Add Persistence**: Store transcriptions in a database
200-
3. **Implement Webhooks**: Send transcriptions to external services
201-
4. **Add Language Support**: Extend for multiple languages (when supported by Moonshine)
202-
5. **Custom Models**: Train custom Moonshine models for specific domains
64+
5. `Moonshine` emits a final `transcript` event which is printed with a timestamp.
65+
6. On **Ctrl-C** the script closes the STT client, VAD and deletes the temporary users.
20366

204-
## Resources
67+
---
20568

206-
- [Moonshine GitHub Repository](https://github.com/usefulsensors/moonshine)
207-
- [GetStream Video SDK Documentation](https://getstream.io/video/docs/)
208-
- [GetStream Python SDK](https://github.com/GetStream/stream-python)
209-
- [Voice Activity Detection with Silero](https://github.com/snakers4/silero-vad)
69+
Need help?
70+
* Stream Video docs – <https://getstream.io/video/docs/>
71+
* Silero VAD – <https://github.com/snakers4/silero-vad>
72+
* Moonshine model – <https://github.com/usefulsensors/moonshine>

examples/stt_moonshine_transcription/env.example

Lines changed: 0 additions & 8 deletions
This file was deleted.

0 commit comments

Comments
 (0)