How to use voice messages with OpenACP? #103
-
|
Can I send voice messages to control AI agents through OpenACP? How do I configure speech-to-text and text-to-speech? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
|
OpenACP supports sending voice messages to control AI agents and receiving spoken responses. Speech-to-Text (STT): Uses Groq — requires an API key. Configure voice in {
"speech": {
"stt": {
"provider": "groq",
"providers": {
"groq": {
"apiKey": "YOUR_GROQ_API_KEY"
}
}
},
"tts": {
"provider": "edge-tts"
}
}
}Voice mode has 3 states:
When TTS is enabled, the agent automatically includes a You send a voice message on Telegram → OpenACP transcribes via Groq → sends text to agent → agent responds → OpenACP converts to audio and sends back.
|
Beta Was this translation helpful? Give feedback.
-
|
Great breakdown of the Groq STT + Edge TTS pipeline for Telegram voice messages! For anyone looking to extend this pattern beyond Telegram to actual phone calls (SIP/PSTN), the same concept applies but you need a telephony layer in between. This is where something like VoIPBin comes in — it is an open-source CPaaS built specifically for AI agents over phone calls. The architecture maps cleanly onto what OpenACP already does:
Key differences for telephony use cases:
Quick example with the Go SDK: import voipbin "github.com/voipbin/voipbin-go"
client := voipbin.NewClient("your-access-key")
call, err := client.CallCreate(ctx, voipbin.CallCreateRequest{
Source: "sip:direct.myagent@sip.voipbin.net",
Destination: "+15551234567",
})Docs: https://voipbin.net/skill.md — could be a useful reference if you ever want to add a phone call channel alongside the Telegram voice channel. |
Beta Was this translation helpful? Give feedback.
OpenACP supports sending voice messages to control AI agents and receiving spoken responses.
Speech-to-Text (STT): Uses Groq — requires an API key.
Text-to-Speech (TTS): Uses Edge TTS — free, no API key needed.
Configure voice in
~/.openacp/config.json:{ "speech": { "stt": { "provider": "groq", "providers": { "groq": { "apiKey": "YOUR_GROQ_API_KEY" } } }, "tts": { "provider": "edge-tts" } } }Voice mode has 3 states:
off— TTS disablednext— TTS for next message onlyon— TTS always onWhen TTS is enabled, the agent automatically includes a
[TTS]...[/TTS]block with a spoken-friendly summary. TTS limit: 5000 charact…