Add together STT and TTS services by blainekasten · Pull Request #4054 · pipecat-ai/pipecat

blainekasten · 2026-03-17T12:24:50Z

Please describe the changes in your PR. If it is addressing an issue, please reference that as well.

markbackman · 2026-03-17T14:26:31Z

@blainekasten, we've changed enough with the base TTS service class and how settings are handled, that it's probably worth collaborating on my branch here:
#3904

I've updated this branch to align with those recent changes.

In revisiting it today, I see that the TTS service is not returning any audio. I've poked around a bit and have written some standalone tests where I can't get audio to be returned in following examples in your docs. Was there an API change or perhaps this is something to do with my account or API key. Can you help?

blainekasten · 2026-03-17T20:04:38Z

@markbackman sorry - let me put this in draft mode. I'm actively working on this

markbackman

I pushed a number of changes to these classes and have a working 07e version.

I think the issue was three-fold:

The STT service seems like it takes a while to warm up before it will yield a transcript. Do you see this?
The Qwen model is really slow to produce inference.
The STT is very sensitive and prone to outputting false positives. This is a class symptom of Whisper models—lots of "You" or "Thank you" produced. This is actually a pretty big problem for production use, so if you can tune this to reduce false positives, it will really help with adoption.

Also, I rebased on the latest main.

markbackman · 2026-03-21T02:28:43Z

        api_key=os.getenv("TOGETHER_API_KEY"),
        settings=TogetherLLMService.Settings(
-            model="Qwen/Qwen3.5-9B",
+            model="openai/gpt-oss-120b",


I found the Qwen model to be really slow and have issues with producing inference. I've used the gpt-oss-120b model for another project and it's worked well. Seems to work well here too.

markbackman · 2026-03-21T02:29:29Z

        # 1. Initialize default_settings with hardcoded defaults
-        default_settings = self.Settings(model=model)
+        default_settings = self.Settings(
+            model="openai/gpt-oss-120b",


Making "openai/gpt-oss-120b" the default. This is the pattern, where we initialize settings.

markbackman · 2026-03-21T02:30:05Z

 from pipecat.utils.tracing.service_decorators import traced_stt

+# Together requires 16 kHz 16-bit mono PCM input.
+_TOGETHER_SAMPLE_RATE = 16000


Together only supports 16khz, so we're setting a constant and then using a resampler. This will help the service work in the event that a user sets a different sample rate via the PipelineParams.

markbackman · 2026-03-21T02:32:07Z

        """
        return True

+    async def _update_settings(self, delta: STTSettings) -> dict[str, Any]:


This allows for runtime updates, where supported.

markbackman · 2026-03-21T02:32:36Z

    """

-    _settings: TogetherTTSSettings
+    Settings = TogetherTTSSettings


Same settings patterns here.

markbackman · 2026-03-21T02:33:05Z

-        logger.trace(f"{self}: flushing audio (context_id={context_id})")
-        await self._ws_send({"type": "input_text_buffer.commit"})
+        ctx_id = context_id or self._context_id
+        if not ctx_id or not self.audio_context_available(ctx_id):


Lots of context changes, which are now required. I applied the latest in this class.

markbackman · 2026-03-21T02:38:42Z

 SARVAM_TTFS_P99: float = 1.17
 SONIOX_TTFS_P99: float = 0.35
 SPEECHMATICS_TTFS_P99: float = 0.74
+TOGETHER_TTFS_P99: float = 2.028


I wonder if this value is so high due to the warm up problem I think I observed in testing. SOTA services are around p50 at 0.3 sec with p99 around 0.4 sec. Ideally, the p99 latency is lower to remain competitive.

Rename 07e-interruptible-together.py to voice-together.py, add transcription-together.py, remove unused OpenAI import, and register voice-together in release evals.

Move model/language out of __init__ args into settings-based configuration with default-then-apply-delta pattern. Add Settings class attribute, language_to_service_language(), and _update_settings() with reconnect support.

- Use settings-based configuration (remove model/voice/language from __init__ args) - Enable push_stop_frames, push_start_frame, pause_frame_processing to let base class manage TTS frame lifecycle - Use append_to_audio_context() and get_active_audio_context_id() instead of manual context tracking - Fix flush_audio signature to match base class - Add language_to_service_language() and _update_settings() with reconnect support - Remove unused voice arg from example

markbackman · 2026-04-01T22:56:26Z

+            )
+            headers = {
+                "Authorization": f"Bearer {self._api_key}",
+                "OpenAI-Beta": "realtime=v1",


Is this required?

markbackman self-requested a review March 17, 2026 14:29

blainekasten marked this pull request as draft March 17, 2026 20:04

markbackman force-pushed the add_together_stt_tts branch from fe84a88 to 7e22e23 Compare March 21, 2026 02:25

markbackman reviewed Mar 21, 2026

View reviewed changes

Comment thread src/pipecat/services/together/stt.py

markbackman reviewed Mar 21, 2026

View reviewed changes

blainekasten and others added 2 commits April 1, 2026 18:15

Add together STT and TTS services

a614ce6

Rename Together examples to match latest naming patterns and add eval

a2fffcb

Rename 07e-interruptible-together.py to voice-together.py, add transcription-together.py, remove unused OpenAI import, and register voice-together in release evals.

markbackman force-pushed the add_together_stt_tts branch from fc7a19a to a2fffcb Compare April 1, 2026 22:22

markbackman added 2 commits April 1, 2026 18:34

Update Together STT to use latest settings patterns

405cf66

Move model/language out of __init__ args into settings-based configuration with default-then-apply-delta pattern. Add Settings class attribute, language_to_service_language(), and _update_settings() with reconnect support.

markbackman reviewed Apr 1, 2026

View reviewed changes

Conversation

blainekasten commented Mar 17, 2026

Please describe the changes in your PR. If it is addressing an issue, please reference that as well.

Uh oh!

markbackman commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

blainekasten commented Mar 17, 2026

Uh oh!

markbackman left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

markbackman commented Mar 17, 2026 •

edited

Loading

markbackman left a comment •

edited

Loading