Skip to content

Commit 00360ee

Browse files
authored
chore: tune gunicorn worker settings (#1)
1 parent 4fe467a commit 00360ee

5 files changed

Lines changed: 7 additions & 2 deletions

File tree

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,4 +29,4 @@ HEALTHCHECK --interval=30s --timeout=10s --start-period=10s --retries=3 \
2929
CMD curl -f http://localhost:8000/api/health || exit 1
3030

3131
# Run the application
32-
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
32+
CMD ["sh", "-c", "exec gunicorn app.main:app -w ${APP_WORKERS:-3} -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000 --timeout 120 --graceful-timeout 30"]

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -229,6 +229,7 @@ The main latency knobs are:
229229
| `STT_INITIAL_PROMPT` | empty | Optional Whisper prompt for domain terms, names, and expected vocabulary. |
230230
| `whisper.ping_interval_seconds` | `null` | App-to-STT WebSocket ping interval. `null` disables client pings, which avoids timeouts during long local model inference. |
231231
| `whisper.ping_timeout_seconds` | `null` | App-to-STT WebSocket ping timeout. |
232+
| `APP_WORKERS` | `3` | Number of PolyTalk app Gunicorn workers. Increase for more concurrent sessions after checking CPU and memory headroom. |
232233
| `STT_WORKERS` | `1` | Number of STT web workers. Each worker loads its own Whisper model. |
233234
| `STT_PRELOAD_MODEL` | `true` | Load the Whisper model during STT startup instead of delaying the first stream. |
234235
| `STT_CHUNK_OVERLAP_SECONDS` | `0.25` | Audio overlap between STT windows. Helps avoid missing words at chunk boundaries. |
@@ -241,6 +242,7 @@ The main latency knobs are:
241242
| `translation.model` | `qwen3-8b` | Use a model supported by your provider or self-hosted server, such as qwen3-8b, TranslateGama, or another open-source/open-weight model. |
242243
| `translation.max_tokens` | `240` | Maximum translation output tokens. Keep bounded for live streaming, but allow enough room for Indic-script targets and longer sentence buffers. |
243244
| `tts.timeout_seconds` | `10` | Maximum wait for TTS generation. |
245+
| `TTS_WORKERS` | `4` | Number of Piper Gunicorn workers. Keep `2-4` on small hosts; raise toward `min(8, CPU cores)` only after CPU and memory headroom are confirmed. |
244246

245247
For larger continuous-speech translation chunks, start with:
246248

docker-compose.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ services:
6565
environment:
6666
- PIPER_MODEL=${TTS_MODEL:-en_GB-jenny_dioco-medium}
6767
- PIPER_DATA_DIR=/data
68+
- TTS_WORKERS=${TTS_WORKERS:-4}
6869
volumes:
6970
- ./tts/wsgi.py:/app/wsgi.py:ro
7071
- ./tts/voices:/data:ro
@@ -93,6 +94,7 @@ services:
9394
# Application
9495
- APP_HOST=0.0.0.0
9596
- APP_PORT=8000
97+
- APP_WORKERS=${APP_WORKERS:-3}
9698
- APP_DEBUG=${APP_DEBUG:-false}
9799
- ALLOWED_ORIGINS=${ALLOWED_ORIGINS:-http://localhost:9000,http://127.0.0.1:9000}
98100
- LOG_LEVEL=${LOG_LEVEL:-INFO}

requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
# Web framework
55
fastapi>=0.109.0
66
uvicorn[standard]>=0.27.0
7+
gunicorn>=22.0.0
78
python-multipart>=0.0.6
89

910
# Templates

tts/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,4 +23,4 @@ COPY voices/ /data/
2323

2424
EXPOSE 5000
2525

26-
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "--workers", "4", "wsgi:app"]
26+
CMD ["sh", "-c", "exec gunicorn --bind 0.0.0.0:5000 --workers ${TTS_WORKERS:-4} wsgi:app"]

0 commit comments

Comments
 (0)