You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+11-5Lines changed: 11 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -106,22 +106,28 @@ Get a free API key from [Stream](https://getstream.io/). Developers receive **33
106
106
107
107
|**Plugin Name**|**Description**|**Docs Link**|
108
108
|-------------|-------------|-----------|
109
-
| AWS | AWS (Bedrock) integration with support for standard LLM (Qwen, Claude with vision), realtime with Nova 2 Sonic, and TTS with AWS Polly |[AWS](https://visionagents.ai/integrations/aws)|
109
+
| AWS Bedrock | Realtime speech-to-speech plugin using Amazon Nova models with automatic reconnection |[AWS](https://visionagents.ai/integrations/aws-bedrock)|
110
+
| AWS Polly | TTS plugin using Amazon's cloud-based service with natural-sounding voices and neural engine support |[AWS Polly](https://visionagents.ai/integrations/aws-polly)|
110
111
| Cartesia | TTS plugin for realistic voice synthesis in real-time voice applications |[Cartesia](https://visionagents.ai/integrations/cartesia)|
111
-
| Decart | Real-time video restyling capabilities using generative AI models|[Decart](https://visionagents.ai/integrations/decart)|
112
+
| Decart | Real-time AI video transformation service for applying artistic styles and effects to video streams|[Decart](https://visionagents.ai/integrations/decart)|
112
113
| Deepgram | STT plugin for fast, accurate real-time transcription with speaker diarization |[Deepgram](https://visionagents.ai/integrations/deepgram)|
113
114
| ElevenLabs | TTS plugin with highly realistic and expressive voices for conversational agents |[ElevenLabs](https://visionagents.ai/integrations/elevenlabs)|
115
+
| Fast-Whisper | High-performance STT plugin using OpenAI's Whisper model with CTranslate2 for fast inference |[Fast-Whisper](https://visionagents.ai/integrations/fast-whisper)|
114
116
| Fish Audio | STT and TTS plugin with automatic language detection and voice cloning capabilities |[Fish Audio](https://visionagents.ai/integrations/fish)|
115
117
| Gemini | Realtime API for building conversational agents with support for both voice and video |[Gemini](https://visionagents.ai/integrations/gemini)|
116
-
| HeyGen | Realtime interactive avatars powered by [HeyGen](https://heygen.com/)|[Heygen](https://visionagents.ai/integrations/heygen)|
118
+
| HeyGen | Realtime interactive avatars powered by [HeyGen](https://heygen.com/)|[HeyGen](https://visionagents.ai/integrations/heygen)|
117
119
| Inworld | TTS plugin with high-quality streaming voices for real-time conversational AI agents |[Inworld](https://visionagents.ai/integrations/inworld)|
118
120
| Kokoro | Local TTS engine for offline voice synthesis with low latency |[Kokoro](https://visionagents.ai/integrations/kokoro)|
119
121
| Moondream | Moondream provides realtime detection and VLM capabilities. Developers can choose from using the hosted API or running locally on their CUDA devices. Vision Agents supports Moondream's Detect, Caption and VQA skills out-of-the-box. |[Moondream](https://visionagents.ai/integrations/moondream)|
120
122
| OpenAI | Realtime API for building conversational agents with out of the box support for real-time video directly over WebRTC, LLMs and Open AI TTS |[OpenAI](https://visionagents.ai/integrations/openai)|
123
+
| OpenRouter | LLM plugin providing access to multiple providers (Anthropic, Google, OpenAI) through a unified API |[OpenRouter](https://visionagents.ai/integrations/openrouter)|
124
+
| Qwen | Realtime audio plugin using Alibaba's Qwen3 with native audio output and built-in speech recognition |[Qwen](https://visionagents.ai/integrations/qwen)|
125
+
| Roboflow | Object detection processor using Roboflow's hosted API or local RF-DETR models |[Roboflow](https://visionagents.ai/integrations/roboflow)|
121
126
| Smart Turn | Advanced turn detection system combining Silero VAD, Whisper, and neural models for natural conversation flow |[Smart Turn](https://visionagents.ai/integrations/smart-turn)|
127
+
| Ultralytics | Real-time pose detection processor using YOLO models with skeleton overlays |[Ultralytics](https://visionagents.ai/integrations/ultralytics)|
122
128
| Vogent | Neural turn detection system for intelligent turn-taking in voice conversations |[Vogent](https://visionagents.ai/integrations/vogent)|
123
129
| Wizper | STT plugin with real-time translation capabilities powered by Whisper v3 |[Wizper](https://visionagents.ai/integrations/wizper)|
124
-
| xAI |xAI (Grok) integration for using powerful language models in conversational AI applications|[xAI](https://visionagents.ai/integrations/xai)|
130
+
| xAI |LLM plugin using xAI's Grok models with advanced reasoning and real-time knowledge|[xAI](https://visionagents.ai/integrations/xai)|
125
131
126
132
127
133
## Processors
@@ -230,7 +236,7 @@ While building the integrations, here are the limitations we've noticed (Dec 202
230
236
231
237
## We are hiring
232
238
233
-
Join the team behind this project - we’re hiring a Staff Python Engineer to architect, build, and maintain a powerful toolkit for developers integrating voice and video AI into their products.
239
+
Join the team behind this project - we’re hiring a Staff Python Engineer to architect, build, and maintain a powerful toolkit for developers integrating voice and video AI into their products.
0 commit comments