Skip to content

Feature Request: Support OpenAI speech-to-text interface /v1/audio/transcriptions #21852

@candrews

Description

@candrews

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Now that we have support for Gemma 4 STT (#21421), we should consider implementing OpenAI's explicit speech-to-text API. Documentation is here https://platform.openai.com/docs/guides/speech-to-text

Example of v1/audio/transcriptions

from openai import OpenAI
client = OpenAI()

audio_file= open("/path/to/file/audio.mp3", "rb")
transcription = client.audio.transcriptions.create(
    model="whisper-1", 
    file=audio_file
)

print(transcription.text)

The llama.cpp web interface already supports uploading audio for transcription but it's not the same API.

Supporting streaming transcriptions would be ideal, but non-streaming would be a great start.

Motivation

Implementing the /v1/audio/transcriptions API will allow a wide variety of existing clients to use llama.cpp for STT.

This enhancement would particularly benefit Home Assistant users. Currently, HA users need to run a separate STT model (such as whisper.cpp or parakeet) and LLM (such as Gemma), which requires more resources and complexity. With this enhancement, HA users could just run Gemma 4 E2B/E4B and use it for both STT and LLM purposes.

Possible Implementation

No response

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions