| external help file | PSOpenAI-help.xml |
|---|---|
| Module Name | PSOpenAI |
| online version | https://github.com/mkht/PSOpenAI/blob/main/Docs/Set-RealtimeTranscriptionSessionConfiguration.md |
| schema | 2.0.0 |
Set the realtime transcription session's configuration.
Set-RealtimeTranscriptionSessionConfiguration
[-EventId <String>]
[-InputAudioFormat <String>]
[-InputAudioNoiseReductionType <String>]
[-InputAudioTranscriptionModel <String>]
[-InputAudioTranscriptionLanguage <String>]
[-InputAudioTranscriptionPrompt <String>]
[-EnableTurnDetection <Boolean>]
[-TurnDetectionType <String>]
[-TurnDetectionEagerness <String>]
[-TurnDetectionThreshold <Single>]
[-TurnDetectionPrefixPadding <UInt16>]
[-TurnDetectionSilenceDuration <UInt16>]
[-CreateResponseOnTurnEnd <Boolean>]
[-InterruptResponse <Boolean>]
[-Include <Single[]>]
Set the realtime transcription session's configuration.
PS C:\> Set-RealtimeTranscriptionSessionConfiguration `
-InputAudioTranscriptionModel 'gpt-4o-transcribe'
-InputAudioTranscriptionLanguage 'de'
-EnableTurnDetection $true
-TurnDetectionType 'server_vad'
-TurnDetectionThreshold 0.5Enables the server VAD mode. In this mode, the server will run voice activity detection (VAD) over the incoming audio and respond after the end of speech.
Type: Boolean
Required: False
Position: NamedOptional client-generated ID used to identify this event.
Type: String
Required: False
Position: NamedThe format of input audio. Options are pcm16, g711_ulaw, or g711_alaw.
Type: String
Required: False
Position: NamedType of noise reduction. none is disable, near_field is for close-talking microphones such as headphones, far_field is for far-field microphones such as laptop or conference room microphones.
Type: String
Required: False
Position: NamedThe model to use for transcription, current options are gpt-4o-transcribe, gpt-4o-mini-transcribe, and whisper-1
Type: String
Required: False
Position: Named
Default value: whisper-1The language of the input audio. Supplying the input language in ISO-639-1 (e.g. en) format will improve accuracy and latency.
Type: String
Required: False
Position: NamedAn optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.
Type: String
Required: False
Position: NamedAmount of audio to include before the VAD detected speech (in milliseconds).
Type: UInt16
Required: False
Position: NamedDuration of silence to detect speech stop (in milliseconds). With shorter values the model will respond more quickly, but may jump in on short pauses from the user.
Type: UInt16
Required: False
Position: NamedActivation threshold for VAD (0.0 to 1.0), this defaults to 0.5. A higher threshold will require louder audio to activate the model, and thus might perform better in noisy environments.
Type: Single
Required: False
Position: NamedType of turn detection, server_vad is automatically chunks the audio based on periods of silence, semantic_vad is chunks the audio when the model believes based on the words said by the user that they have completed their utterance.
Type: String
Required: False
Position: Named
Default value: server_vadUsed only for semantic_vad mode. The eagerness of the model to respond. low will wait longer for the user to continue speaking, high will respond more quickly. auto is the default and is equivalent to medium.
Type: String
Required: False
Position: Named
Default value: autoNot available for transcription sessions.
Type: Boolean
Required: False
Position: Named
Default value: TrueNot available for transcription sessions.
Type: Boolean
Required: False
Position: Named
Default value: True