OpenVINO Model Server includes now the audio/transcriptions and audio/translations endpoints using OpenAI API.
It is used to execute speech to text task with OpenVINO GenAI pipeline.
Please see the OpenAI API Transcription Reference and OpenAI API Translation Reference for more information on the API.
The are two endpoints exposed:
http://server_name:port/v3/audio/transcriptions http://server_name:port/v3/audio/translations
Request body must be in multipart/form-data format.
curl -X POST http://localhost:8000/v3/audio/transcriptions \
-F "model=OpenVINO/whisper-large-v3-fp16-ov" \
-F "file=@speech_english.wav"
curl -X POST http://localhost:8000/v3/audio/translations \
-F "model=OpenVINO/whisper-large-v3-fp16-ov" \
-F "file=@speech_spanish.wav"
{"text":"..."}| Param | OpenVINO Model Server | OpenAI /audio/transcriptions API | Type | Description |
|---|---|---|---|---|
| model | ✅ | ✅ | string (required) | Name of the model to use. Note: This can also be omitted to fall back to URI based routing. Read more on routing topic TODO |
| file | ✅ | file (required) | The audio file object to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm. ( |
|
| language | ✅ | ✅ | string | The language of the input audio in ISO-639-1. Providing language for multilanguage model may improve accuracy and performance. |
| chunking_strategy | ❌ | ✅ | "auto" or object | Controls how the audio is cut into chunks. |
| include | ❌ | ✅ | array | Additional information to include in the transcription response. |
| known_speaker_names | ❌ | ✅ | array | List of speaker names corresponding to the audio samples |
| known_speaker_references | ❌ | ✅ | array | Optional list of audio samples with known speaker references matching known_speaker_names |
| prompt | ❌ | ✅ | string | An optional text to guide the model's style or continue a previous audio segment. |
| response_format | ❌ | ✅ | string | The format of the output. |
| stream | ❌ | ✅ | boolean | Generate the response in streaming mode. |
| temperature | ✅ | ✅ | float (default: 1.0) |
The sampling temperature, cannot be negative. |
| timestamp_granularities | ✅ | array | The timestamp granularities to populate for this transcription. Supported values: "word" and "segment" (enable_word_timestamps: true need to be set in graph.pbtxt) |
| Param | OpenVINO Model Server | OpenAI /audio/transcriptions API | Type | Description |
|---|---|---|---|---|
| model | ✅ | ✅ | string (required) | Name of the model to use. Note: This can also be omitted to fall back to URI based routing. Read more on routing topic TODO |
| file | ✅ | file (required) | The audio file object to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm. ( |
|
| prompt | ❌ | ✅ | string | An optional text to guide the model's style or continue a previous audio segment. |
| response_format | ❌ | ✅ | string | The format of the output. |
| temperature | ✅ | ✅ | float (default: 1.0) |
The sampling temperature, cannot be negative. |
| Param | OpenVINO Model Server | OpenAI /audio/transcriptions API | Type | Description |
|---|---|---|---|---|
| text | ✅ | ✅ | string | The transcribed text. |
| logprobs | ❌ | ✅ | array | The log probabilities of the tokens in the transcription. |
| usage | ❌ | ✅ | object | Token usage statistics for the request. |
| Param | OpenVINO Model Server | OpenAI /audio/transcriptions API | Type | Description |
|---|---|---|---|---|
| text | ✅ | ✅ | string | The translated text. |
Endpoint can raise an error related to incorrect request in the following conditions:
- Incorrect format of any of the fields based on the schema
End to end demo with transcription endpoint End to end demo with translation endpoint