Skip to content

Add stream=true immediate audio mode for GET /tts#642

Open
thomassmith1969 wants to merge 3 commits into
jamiepine:mainfrom
thomassmith1969:feature/tts-get-streaming
Open

Add stream=true immediate audio mode for GET /tts#642
thomassmith1969 wants to merge 3 commits into
jamiepine:mainfrom
thomassmith1969:feature/tts-get-streaming

Conversation

@thomassmith1969
Copy link
Copy Markdown

@thomassmith1969 thomassmith1969 commented May 12, 2026

Summary

  • Add a stream query flag to GET /tts so callers can request immediate audio streaming.
  • Keep existing JSON/status-link behavior as the default when stream is not set.
  • Document the new GET /tts stream behavior in backend API docs.

What changed

  • backend/routes/speak.py
    • Add stream=true query flag on GET /tts.
    • Route stream=true requests to the streaming generation path.
    • Preserve default engine fallback to LuxTTS when engine/model is omitted.
  • backend/routes/generations.py
    • Update /generate/stream implementation to emit WAV progressively (chunked response) instead of buffering full output before sending.
  • backend/README.md
    • Add usage docs and examples for GET /tts stream mode.

Testing

  • python3 -m py_compile backend/routes/generations.py backend/routes/speak.py
  • Live endpoint smoke tests against Docker deployment:
    • GET /tts?stream=true returns audio/wav with transfer-encoding: chunked
    • Non-stream GET /tts still returns JSON metadata links

Breaking changes

  • None. Existing GET /tts behavior is unchanged unless stream=true is provided.

Checklist

  • Code follows style guidelines
  • Documentation updated
  • Changes tested
  • No breaking changes (or documented)
  • CHANGELOG.md updated (changelog is release-managed in this repo)

AI Assistance Disclosure

This contribution was developed with assistance from GitHub Copilot (GPT-5.3-Codex). The contributor reviewed, tested, and approved all final code and documentation changes.

Summary by CodeRabbit

  • New Features

    • Query-based text-to-speech endpoint with voice/profile and engine selection, plus personality and language options
    • Three response modes: JSON status, real-time WAV streaming, and wait/poll until completion
  • Improvements

    • Real-time streaming now emits WAV bytes as generated with smoother chunk crossfades and per-chunk processing
    • Response headers adjusted to favor inline streaming (removed attachment behavior) and improved error propagation
  • Documentation

    • Added usage docs describing endpoint modes, streaming behavior, and profile requirements

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 12, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0c86d0f5-a3c7-46db-9842-d0cffee4a9b8

📥 Commits

Reviewing files that changed from the base of the PR and between df1cddf and 59ac71d.

📒 Files selected for processing (1)
  • backend/routes/generations.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/routes/generations.py

📝 Walkthrough

Walkthrough

Adds a query-friendly GET /tts endpoint (JSON, stream, or wait modes) and refactors /generate/stream to emit per-chunk PCM with in-memory trim/effects/normalization and crossfade blending; README updated.

Changes

TTS Query Endpoint with Streaming Refactor

Layer / File(s) Summary
/generate/stream refactoring: per-chunk PCM and crossfade blending
backend/routes/generations.py
Generates audio per text chunk, converts float PCM to little-endian 16-bit PCM, emits a streaming WAV header with placeholder lengths, applies optional per-chunk trim/effects/normalize, blends adjacent chunk tails with crossfade envelopes, yields PCM bytes as they become available, and updates error chaining/imports.
GET /tts query endpoint with streaming, polling, and response modes
backend/routes/speak.py
Adds GET /tts with query-based profile/voice and engine/model, personality/language options, and response modes: immediate JSON (status/audio links), direct streaming (stream=true), and polling (wait=true) that returns JSON, a 307 redirect, or audio bytes based on response. Integrates with the refactored streaming flow and existing speak/history APIs.
README documentation for GET /tts
backend/README.md
Documents the new endpoint, example curl commands for JSON and stream=true modes, response formats (status/audio links or chunked audio/wav), and profile-selection notes.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant TTS_Get
  participant Resolve
  participant SpeakFlow
  participant StreamGen
  participant History
  participant Storage

  Client->>TTS_Get: GET /tts?text=...&stream=true|wait=true&response=...
  TTS_Get->>Resolve: determine engine/profile/personality
  alt stream=true
    TTS_Get->>StreamGen: stream_speech(...)
    StreamGen-->>Client: StreamingResponse audio/wav
  else not stream
    TTS_Get->>SpeakFlow: create generation (background)
    SpeakFlow-->>TTS_Get: generation id / status URLs
    TTS_Get-->>Client: 202 JSON (status/audio links)
    alt wait=true
      TTS_Get->>History: poll history.get_generation(id) (poll_ms)
      History-->>TTS_Get: completed | failed | missing
      alt completed && response=redirect
        TTS_Get-->>Client: 307 redirect to audio URL
      else completed && response=stream
        TTS_Get->>Storage: get_audio(...)
        Storage-->>TTS_Get: audio bytes
        TTS_Get-->>Client: audio bytes
      else failed
        TTS_Get-->>Client: 500 JSON
      end
    end
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

"🐰 I hopped through text and bytes today,
Chunk by chunk I stitched the sound's bright way,
Stream, poll, or fetch — the endpoint sings,
Little waves of audio on nimble wings."

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and specifically describes the main change: adding a stream=true mode for the GET /tts endpoint to enable immediate audio streaming.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/routes/generations.py`:
- Around line 388-389: Several except blocks catch ValueError and re-raise
HTTPException without chaining; update each "except ValueError as e:" block that
currently does "raise HTTPException(status_code=400, detail=str(e))" to use
exception chaining by adding "from e" (i.e., "raise
HTTPException(status_code=400, detail=str(e)) from e"). Locate all such handlers
(the except ValueError blocks and the corresponding raise HTTPException calls)
and apply the change so the original traceback is preserved.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b15030b9-56db-4b97-aae6-5e48e38ece3c

📥 Commits

Reviewing files that changed from the base of the PR and between b35b909 and df1cddf.

📒 Files selected for processing (3)
  • backend/README.md
  • backend/routes/generations.py
  • backend/routes/speak.py

Comment thread backend/routes/generations.py Outdated
@thomassmith1969
Copy link
Copy Markdown
Author

Follow-up with corrected formatting:

Addressed the CodeRabbit exception-chaining feedback in commit 59ac71d.

Updated all ValueError-to-HTTPException(400) mappings in backend/routes/generations.py to use raise ... from e at the four reported locations.

Validation run:

  • python3 -m py_compile backend/routes/generations.py
  • VS Code diagnostics: no errors in the updated file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant