Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -325,4 +325,4 @@ Join the **[No-Code Architects Community](https://www.skool.com/no-code-architec

## License

This project is licensed under the [GNU General Public License v2.0 (GPL-2.0)](LICENSE).
This project is licensed under the [GNU General Public License v2.0 (GPL-2.0)](LICENSE).
218 changes: 218 additions & 0 deletions docs/audio/speech.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,218 @@
# TTS (Text-to-Speech) API Endpoint Documentation

**Implemented by:** [Harrison Fisher](https://github.com/HarrisonFisher)

## Overview

The `/v1/audio/speech` endpoint allows clients to convert text into speech using different Text-to-Speech (TTS) engines. The service supports `edge-tts` and `streamlabs-polly` as TTS providers, offering flexibility in the choice of voices and speech synthesis options. It integrates with the application’s queuing system to manage potentially time-consuming operations, ensuring smooth processing of requests.

## Endpoint

- **URL**: `/v1/audio/speech`
- **Method**: `POST`

## Request

### Headers

- `x-api-key`: Required. Your API authentication key.

### Body Parameters

| Parameter | Type | Required | Description |
|---------------|--------|----------|-------------|
| `tts` | String | No | The TTS engine to use. Default is `edge-tts`. Options: `edge-tts`, `streamlabs-polly` |
| `text` | String | Yes | The text to convert to speech. |
| `voice` | String | No | The voice to use. The valid voice list depends on the TTS engine. |
| `webhook_url` | String | No | A URL to receive a callback notification when processing is complete. If provided, the request will be processed asynchronously. |
| `id` | String | No | A custom identifier for tracking the request. |

### Available Voices

- For `edge-tts` (default TTS engine):
- Default voice: "en-US-AvaNeural" - Supports a wide range of voices in multiple languages
- Many voices can be previewed at: https://tts.travisvn.com/ (note: this is a third-party site)
- Examples include: "en-US-AvaNeural", "en-GB-SoniaNeural", "es-ES-ElviraNeural", etc.

- For `streamlabs-polly`:
- Default voice: "Brian"
- Available voices:
```
Brian, Emma, Russell, Joey, Matthew, Joanna, Kimberly,
Amy, Geraint, Nicole, Justin, Ivy, Kendra, Salli, Raveena
```

> To get a complete list of available voices for either engine, make a request with an invalid voice name - the error response will include all valid voices.

### Example Request

```json
{
"tts": "edge-tts",
"text": "Hello, world!",
"voice": "en-US-AvaNeural",
"webhook_url": "https://your-webhook-endpoint.com/callback",
"id": "custom-request-id-123"
}
```

### Example cURL Command

```bash
curl -X POST \
https://api.example.com/v1/audio/speech \
-H 'Content-Type: application/json' \
-H 'x-api-key: your-api-key-here' \
-d '{
"tts": "edge-tts",
"text": "Hello, world!",
"voice": "en-US-AvaNeural",
"webhook_url": "https://your-webhook-endpoint.com/callback",
"id": "custom-request-id-123"
}'
```

## Response

### Synchronous Response (No webhook\_url provided)

If no `webhook_url` is provided, the request will be processed synchronously and return:

```json
{
"code": 200,
"id": "custom-request-id-123",
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"response": "https://storage.example.com/audio-file.mp3",
"message": "success",
"run_time": 2.345,
"queue_time": 0,
"total_time": 2.345,
"pid": 12345,
"queue_id": 67890,
"queue_length": 0,
"build_number": "1.0.123"
}
```

### Asynchronous Response (webhook\_url provided)

If a `webhook_url` is provided, the request will be queued for processing and immediately return:

```json
{
"code": 202,
"id": "custom-request-id-123",
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"message": "processing",
"pid": 12345,
"queue_id": 67890,
"max_queue_length": "unlimited",
"queue_length": 1,
"build_number": "1.0.123"
}
```

When processing is complete, a webhook will be sent to the provided URL with the following payload:

```json
{
"endpoint": "/v1/audio/speech",
"code": 200,
"id": "custom-request-id-123",
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"response": "https://storage.example.com/audio-file.mp3",
"message": "success",
"pid": 12345,
"queue_id": 67890,
"run_time": 3.456,
"queue_time": 1.234,
"total_time": 4.690,
"queue_length": 0,
"build_number": "1.0.123"
}
```

### Error Responses

#### Invalid Request Format (400 Bad Request)

```json
{
"code": 400,
"id": null,
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"message": "IInvalid request: 'text' is a required property",
"pid": 12345,
"queue_id": 67890,
"queue_length": 0,
"build_number": "1.0.123"
}
```

#### Authentication Error (401 Unauthorized)

```json
{
"code": 401,
"message": "Invalid or missing API key",
"build_number": "1.0.123"
}
```

#### Queue Limit Reached (429 Too Many Requests)

```json
{
"code": 429,
"id": "custom-request-id-123",
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"message": "MAX_QUEUE_LENGTH (100) reached",
"pid": 12345,
"queue_id": 67890,
"queue_length": 100,
"build_number": "1.0.123"
}
```

#### Processing Error (500 Internal Server Error)

```json
{
"code": 500,
"id": "custom-request-id-123",
"job_id": "550e8400-e29b-41d4-a716-446655440000",
"message": "Error downloading audio file: Connection refused",
"pid": 12345,
"queue_id": 67890,
"queue_length": 0,
"build_number": "1.0.123"
}
```

## Error Handling

* **Missing Required Parameters**: If `text` is missing or empty, a 400 Bad Request response will be returned.
* **Invalid TTS Engine**: If the `tts` parameter is invalid (e.g., not `edge-tts` or `streamlabs-polly`), a 400 Bad Request response will be returned.
* **Authentication Failure**: If the API key is invalid or missing, a 401 Unauthorized response will be returned.
* **Queue Limit**: If the queue is full (when MAX\_QUEUE\_LENGTH is set), a 429 Too Many Requests response will be returned.
* **Processing Errors**: Any errors during text processing, speech synthesis, or audio file generation will result in a 500 Internal Server Error response with details in the message field.

## Usage Notes

1. **Asynchronous Processing**: For longer processing times (e.g., generating speech from large texts), it's recommended to use the `webhook_url` parameter for asynchronous processing.
2. **Queue Behavior**: If the system is under heavy load, requests with `webhook_url` will be queued. The `MAX_QUEUE_LENGTH` environment variable controls the maximum queue size.

## Common Issues

1. **Invalid Voice**: Make sure the selected voice is valid for the chosen TTS engine.
2. **Webhook Failures**: If your webhook endpoint is unavailable when processing completes, you might not receive the completion notification.
3. **Timeout Issues**: Long texts or heavy load might cause timeouts during speech synthesis.

## Best Practices

1. **Use Webhooks for Large Texts**: Consider using the webhook approach for large text-to-speech requests to avoid timeouts.
2. **Include an ID**: Always include a custom `id` parameter to help track your requests, especially in webhook responses.
3. **Error Handling**: Implement robust error handling to manage various HTTP status codes.
4. **Webhook Reliability**: Ensure your webhook endpoint is reliable and can handle retries if necessary.
5. **Text Chunking**: If you're processing large bodies of text, chunk it appropriately to avoid exceeding character limits.
4 changes: 3 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,6 @@ psutil
boto3
Pillow
matplotlib
yt-dlp
yt-dlp
edge-tts
asyncio
72 changes: 72 additions & 0 deletions routes/v1/audio/speech.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Copyright (c) 2025 Stephen G. Pope
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License along
# with this program; if not, write to the Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

###########################################################################


# Author: Harrison Fisher (https://github.com/HarrisonFisher)
# Date: May 2025
# Created new route: /v1/audio/speech

from flask import Blueprint, request, jsonify
from app_utils import validate_payload, queue_task_wrapper
import logging
from services.authentication import authenticate
from services.cloud_storage import upload_file
from services.v1.audio.speech import generate_tts
import os

v1_audio_speech_bp = Blueprint("v1_audio_speech", __name__)
logger = logging.getLogger(__name__)

@v1_audio_speech_bp.route("/v1/audio/speech", methods=["POST"])
@authenticate
@validate_payload({
"type": "object",
"properties": {
"tts": {"type": "string", "enum": ["edge-tts", "streamlabs-polly"]},
"text": {"type": "string"},
"voice": {"type": "string"},
"webhook_url": {"type": "string", "format": "uri"},
"id": {"type": "string"},
},
"required": ["text"],
"additionalProperties": False
})
@queue_task_wrapper(bypass_queue=False)
def text_to_speech(job_id, data):
tts = data.get("tts", "edge-tts")
text = data["text"]
voice = data.get("voice")
webhook_url = data.get("webhook_url")
id = data.get("id")

logger.info(f"Job {job_id}: Received TTS request for text length {len(text)}")

try:
file = generate_tts(tts, text, voice, job_id)
cloud_url = upload_file(file)
logger.info(f"Job {job_id}: TTS audio uploaded to cloud storage: {cloud_url}")
return cloud_url, "/v1/audio/speech", 200
except Exception as e:
logger.error(f"Job {job_id}: Error during TTS process - {str(e)}")
return str(e), "/v1/audio/speech", 500
finally:
try:
if os.path.exists(file):
os.remove(file)
except Exception as cleanup_error:
logger.warning(f"Cleanup failed: {cleanup_error}")
Loading