feat: add OpenAI-compatible Audio Speech API endpoint by pinghe · Pull Request #41 · OpenMOSS/MOSS-TTS-Nano

pinghe · 2026-04-24T13:06:41Z

Summary

Add OpenAI-compatible POST /v1/audio/speech endpoint for TTS
Fix torchaudio SoX backend segfault- Fix voice preset mappings referencing non-existent audio files- Enable GPU inference (was hardcoded to CPU)- Add timestamps to uvicorn access logs

Changes

New file: openai_audio_api.py

OpenAI /v1/audio/speech request/response models (SpeechRequest, make_error_response)
Voice mapping: OpenAI voice names to MOSS-TTS-Nano presets (alloy to Junhao, echo to Xiaoyu, etc.)
Audio format helpers: WAV header construction, PCM 16-bit encoding, MP3 encoding via lameenc
Streaming generators: iter_pcm_audio, generate_wav_stream, generate_mp3_stream, generate_pcm_stream
Supports wav, mp3, and pcm response formats

app.py

Add OpenAI-compatible endpoint using background thread + queue streaming model
- Avoids holding _cpu_execution_lock inside ASGI streaming iterator (prevents deadlock on client disconnect)
- _put() has 30s deadline to prevent threads from blocking indefinitely when client disconnects
- Explicit events_gen.close() to release the lock promptly
- Wrap lameenc return values in bytes() to fix Starlette bytearray type error
Add _patch_torchaudio_backend(): monkey-patch torchaudio to default to soundfile backend, working around SoX segfault
Change --device default from cpu to auto, add cuda option, use resolve_device() for auto GPU detection
Customize uvicorn log config to add timestamps to access logs
Add request/complete logging with elapsed time and audio chunk count

app_onnx.py, infer.py

Add _patch_torchaudio_backend() to fix SoX segfault

moss_tts_nano_runtime.py

Remove 8 voice presets whose audio files don't exist in the repository (Zhiming, Weiguo, Trump, Nathan, Sakura, Aoi, Hina, Mei), keeping only the 8 with actual files

pyproject.toml, requirements.txt

Add lameenc>=1.7.0 dependency (MP3 encoding)
Add openai_audio_api module declaration

Test plan

WAV format returns valid audio (RIFF PCM 16-bit stereo 48kHz)
MP3 format returns valid audio (MPEG layer III, 128kbps, 48kHz), plays correctly in mpv
PCM format returns valid raw audio data
Consecutive requests don't deadlock (lock released correctly)
Service recovers after client disconnect (_put deadline mechanism)
Invalid voice/params return OpenAI-format error JSON (HTTP 400)
GPU auto-detection works when CUDA is available

- Add OpenAI-compatible POST /v1/audio/speech endpoint for TTS - Fix torchaudio SoX backend segfault- Fix voice preset mappings referencing non-existent audio files- Enable GPU inference (was hardcoded to CPU)- Add timestamps to uvicorn access logs - OpenAI /v1/audio/speech request/response models (SpeechRequest, make_error_response) - Voice mapping: OpenAI voice names to MOSS-TTS-Nano presets (alloy to Junhao, echo to Xiaoyu, etc.) - Audio format helpers: WAV header construction, PCM 16-bit encoding, MP3 encoding via lameenc - Streaming generators: iter_pcm_audio, generate_wav_stream, generate_mp3_stream, generate_pcm_stream - Supports wav, mp3, and pcm response formats - Add OpenAI-compatible endpoint using background thread + queue streaming model - Avoids holding _cpu_execution_lock inside ASGI streaming iterator (prevents deadlock on client disconnect) - _put() has 30s deadline to prevent threads from blocking indefinitely when client disconnects - Explicit events_gen.close() to release the lock promptly - Wrap lameenc return values in bytes() to fix Starlette bytearray type error - Add _patch_torchaudio_backend(): monkey-patch torchaudio to default to soundfile backend, working around SoX segfault - Change --device default from cpu to auto, add cuda option, use resolve_device() for auto GPU detection - Customize uvicorn log config to add timestamps to access logs - Add request/complete logging with elapsed time and audio chunk count - Add _patch_torchaudio_backend() to fix SoX segfault - Remove 8 voice presets whose audio files don't exist in the repository (Zhiming, Weiguo, Trump, Nathan, Sakura, Aoi, Hina, Mei), keeping only the 8 with actual files - Add lameenc>=1.7.0 dependency (MP3 encoding) - Add openai_audio_api module declaration - WAV format returns valid audio (RIFF PCM 16-bit stereo 48kHz) - MP3 format returns valid audio (MPEG layer III, 128kbps, 48kHz), plays correctly in mpv - PCM format returns valid raw audio data - Consecutive requests don't deadlock (lock released correctly) - Service recovers after client disconnect (_put deadline mechanism) - Invalid voice/params return OpenAI-format error JSON (HTTP 400) - GPU auto-detection works when CUDA is available

guoqiangui · 2026-04-26T04:47:54Z

missing openai_audio_api.py

- Added noova → Lingyu, ballad → 男播音, yangmi → 杨幂, etc. - Time format: 6:00 → 6点 (processed before colon replacement) - Colon replacement: ： → ， (was 。, which caused repetition and content loss) - Consecutive punctuation cleanup: ，。 or 。， collapsed to 。 - Temperature range: keep N°C and M°C as separate conversions (merging caused repetition) - wav / mp3 / pcm / opus output formats - Opus encoded via ffmpeg subprocess with atempo filter for speed adjustment ┌──────────────┬─────────────────────┬────────────────────┐ │ │ Before │ After │ ├──────────────┼─────────────────────┼────────────────────┤ │ Architecture │ 7 chunks sequential │ 6 workers parallel │ ├──────────────┼─────────────────────┼────────────────────┤ │ Latency │ ~105s │ ~22-25s │ ├──────────────┼─────────────────────┼────────────────────┤ │ Speedup │ - │ ~4.5x │ └──────────────┴─────────────────────┴────────────────────┘ - Pre-create 6 independent ONNX InferenceSession groups (~3MB each) at startup, each with its own thread pool - Parallel chunk execution via concurrent.futures.ThreadPoolExecutor - Each worker acquires an exclusive pool slot, temporarily swapping sessions and rng for thread safety - Results reassembled in original chunk order - Added detailed per-step timing logs (ONNX timing) - Fixed default_cpu_threads using os.cpu_count() which created a new runtime with _parallel_workers=1. Now uses configured thread_count - Fixed synthesize_stream signature compatibility with PyTorch-specific parameters - Added _create_sessions_with_threads() to create session groups with a specified thread count (used by the parallel pool) - Refactored _create_sessions() to delegate to the new method - resolve_prompt_audio_codes fallback: when a voice is not in the builtin manifest, automatically load the corresponding wav from assets/audio/ and encode on the fly - Also checks the PyTorch preset map (_DEFAULT_VOICE_FILES) for wav filename resolution - Supports voices like 杨幂 (zh_11.wav), 男播音 (zh_10.wav) not present in the ONNX manifest - Added 男播音 (zh_10.wav) and 杨幂 (zh_11.wav) presets - Added chunk splitting logs to the OpenAI endpoint - Performance tuning: max_new_frames=200, voice_clone_max_memory_per_sample_gb=0.6, tts_max_batch_size=7 ┌────────────────────────────────┬─────────┐ │ Config │ Latency │ ├────────────────────────────────┼─────────┤ │ PyTorch + CUDA (before tuning) │ ~38s │ ├────────────────────────────────┼─────────┤ │ PyTorch + CUDA (after tuning) │ ~22s │ ├────────────────────────────────┼─────────┤ │ ONNX + CPU (sequential) │ ~105s │ ├────────────────────────────────┼─────────┤ │ ONNX + CPU (6-worker parallel) │ ~22-25s │ └────────────────────────────────┴─────────┘

pinghe added 2 commits April 26, 2026 19:03

missing openai_audio_api.py

3f279d1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add OpenAI-compatible Audio Speech API endpoint#41

feat: add OpenAI-compatible Audio Speech API endpoint#41
pinghe wants to merge 3 commits into
OpenMOSS:mainfrom
pinghe:main

pinghe commented Apr 24, 2026 •

edited

Loading

Uh oh!

guoqiangui commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pinghe commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

New file: openai_audio_api.py

app.py

app_onnx.py, infer.py

moss_tts_nano_runtime.py

pyproject.toml, requirements.txt

Test plan

Uh oh!

guoqiangui commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pinghe commented Apr 24, 2026 •

edited

Loading