feat: add OpenAI-compatible Audio Speech API endpoint#41
Open
pinghe wants to merge 1 commit intoOpenMOSS:mainfrom
Open
feat: add OpenAI-compatible Audio Speech API endpoint#41pinghe wants to merge 1 commit intoOpenMOSS:mainfrom
pinghe wants to merge 1 commit intoOpenMOSS:mainfrom
Conversation
- Add OpenAI-compatible POST /v1/audio/speech endpoint for TTS - Fix torchaudio SoX backend segfault- Fix voice preset mappings referencing non-existent audio files- Enable GPU inference (was hardcoded to CPU)- Add timestamps to uvicorn access logs - OpenAI /v1/audio/speech request/response models (SpeechRequest, make_error_response) - Voice mapping: OpenAI voice names to MOSS-TTS-Nano presets (alloy to Junhao, echo to Xiaoyu, etc.) - Audio format helpers: WAV header construction, PCM 16-bit encoding, MP3 encoding via lameenc - Streaming generators: iter_pcm_audio, generate_wav_stream, generate_mp3_stream, generate_pcm_stream - Supports wav, mp3, and pcm response formats - Add OpenAI-compatible endpoint using background thread + queue streaming model - Avoids holding _cpu_execution_lock inside ASGI streaming iterator (prevents deadlock on client disconnect) - _put() has 30s deadline to prevent threads from blocking indefinitely when client disconnects - Explicit events_gen.close() to release the lock promptly - Wrap lameenc return values in bytes() to fix Starlette bytearray type error - Add _patch_torchaudio_backend(): monkey-patch torchaudio to default to soundfile backend, working around SoX segfault - Change --device default from cpu to auto, add cuda option, use resolve_device() for auto GPU detection - Customize uvicorn log config to add timestamps to access logs - Add request/complete logging with elapsed time and audio chunk count - Add _patch_torchaudio_backend() to fix SoX segfault - Remove 8 voice presets whose audio files don't exist in the repository (Zhiming, Weiguo, Trump, Nathan, Sakura, Aoi, Hina, Mei), keeping only the 8 with actual files - Add lameenc>=1.7.0 dependency (MP3 encoding) - Add openai_audio_api module declaration - WAV format returns valid audio (RIFF PCM 16-bit stereo 48kHz) - MP3 format returns valid audio (MPEG layer III, 128kbps, 48kHz), plays correctly in mpv - PCM format returns valid raw audio data - Consecutive requests don't deadlock (lock released correctly) - Service recovers after client disconnect (_put deadline mechanism) - Invalid voice/params return OpenAI-format error JSON (HTTP 400) - GPU auto-detection works when CUDA is available
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Changes
New file: openai_audio_api.py
app.py
app_onnx.py, infer.py
moss_tts_nano_runtime.py
pyproject.toml, requirements.txt
Test plan