Feature Request: Support OpenAI speech-to-text interface /v1/audio/transcriptions

### Prerequisites

- [x] I am running the latest code. Mention the version if possible as well.
- [x] I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md).
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggml-org/llama.cpp/discussions), and have a new and useful enhancement to share.

### Feature Description

Now that we have support for Gemma 4 STT (https://github.com/ggml-org/llama.cpp/pull/21421), we should consider implementing OpenAI's explicit speech-to-text API. Documentation is here https://platform.openai.com/docs/guides/speech-to-text

Example of `v1/audio/transcriptions`

```python
from openai import OpenAI
client = OpenAI()

audio_file= open("/path/to/file/audio.mp3", "rb")
transcription = client.audio.transcriptions.create(
    model="whisper-1", 
    file=audio_file
)

print(transcription.text)
```

The llama.cpp web interface already supports uploading audio for transcription but it's not the same API.

Supporting streaming transcriptions would be ideal, but non-streaming would be a great start.

### Motivation

Implementing the `/v1/audio/transcriptions` API will allow a wide variety of existing clients to use llama.cpp for STT.

This enhancement would particularly benefit Home Assistant users. Currently, HA users need to run a separate STT model (such as whisper.cpp or parakeet) and LLM (such as Gemma), which requires more resources and complexity. With this enhancement, HA users could just run Gemma 4 E2B/E4B and use it for both STT and LLM purposes.

### Possible Implementation

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Support OpenAI speech-to-text interface /v1/audio/transcriptions #21852

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Support OpenAI speech-to-text interface /v1/audio/transcriptions #21852

Description

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions