Skip to content

Commit e256efc

Browse files
committed
Update documents
1 parent 11a2068 commit e256efc

2 files changed

Lines changed: 73 additions & 6 deletions

File tree

Docs/Request-AudioTranscription.md

Lines changed: 70 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,18 @@ Transcribes audio into the input language.
1515
### Language (Default)
1616
```
1717
Request-AudioTranscription
18-
[[-File] <String>]
18+
[-File] <String>
1919
[-Model <String>]
2020
[-Prompt <String>]
2121
[-ResponseFormat <String>]
2222
[-Temperature <Double>]
2323
[-Include <String[]>]
24+
[-KnownSpeakerNames <String[]>]
25+
[-KnownSpeakerReferences <String[]>]
26+
[-ChunkingStrategy <String>]
27+
[-ChunkingStrategyThreshold <Float>]
28+
[-ChunkingStrategyPrefixPadding <UInt16>]
29+
[-ChunkingStrategySilenceDuration <UInt16>]
2430
[-TimestampGranularities <String[]>]
2531
[-Language <String>]
2632
[-Stream]
@@ -48,6 +54,12 @@ PS C:\> Request-AudioTranscription -File C:\sample\audio.mp3 -ResponseFormat tex
4854
Hello, I am david.
4955
```
5056

57+
### Example 2: Speaker diarization
58+
```PowerShell
59+
PS C:\> $JsonResult = Request-AudioTranscription -File C:\sample\meeting.mp3 -Model gpt-4o-transcribe-diarize -ResponseFormat diarized_json
60+
PS C:\> $JsonResult | ConvertFrom-Json
61+
```
62+
5163
## PARAMETERS
5264

5365
### -File
@@ -57,7 +69,7 @@ The audio file to transcribe, in one of these formats: `flac`, `mp3`, `mp4`, `mp
5769
```yaml
5870
Type: String
5971
Required: True
60-
Position: 1
72+
Position: 0
6173
Accept pipeline input: True (ByValue)
6274
```
6375
@@ -83,7 +95,7 @@ Position: Named
8395
```
8496

8597
### -ResponseFormat
86-
The format of the transcript output, in one of these options: `json`, `text`, `srt`, `verbose_json`, or `vtt`.
98+
The format of the transcript output, in one of these options: `json`, `text`, `srt`, `verbose_json`, `vtt` or `diarized_json`.
8799
The default value is `text`.
88100

89101
```yaml
@@ -114,6 +126,61 @@ Required: False
114126
Position: Named
115127
```
116128

129+
### -KnownSpeakerNames
130+
Optional list of speaker names that correspond to the audio samples provided in `-KnownSpeakerReferences`. Each entry should be a short identifier (for example customer or agent). Up to 4 speakers are supported.
131+
132+
```yaml
133+
Type: String[]
134+
Required: False
135+
Position: Named
136+
```
137+
138+
### -KnownSpeakerReferences
139+
Optional list of audio samples that contain known speaker references matching `-KnownSpeakerNames`. Each sample must be between 2 and 10 seconds, and can use any of the same input audio formats supported by file.
140+
141+
```yaml
142+
Type: String[]
143+
Required: False
144+
Position: Named
145+
```
146+
147+
### -ChunkingStrategy
148+
Controls how the audio is cut into chunks. Options are: `auto`, `server_vad`.
149+
The default value is `auto`.
150+
151+
```yaml
152+
Type: String
153+
Required: False
154+
Position: Named
155+
```
156+
157+
### -ChunkingStrategyThreshold
158+
Sensitivity threshold (0.0 to 1.0) for voice activity detection.
159+
160+
```yaml
161+
Type: Float
162+
Required: False
163+
Position: Named
164+
```
165+
166+
### -ChunkingStrategyPrefixPadding
167+
Amount of audio to include before the VAD detected speech (in milliseconds).
168+
169+
```yaml
170+
Type: UInt16
171+
Required: False
172+
Position: Named
173+
```
174+
175+
### -ChunkingStrategySilenceDuration
176+
Duration of silence to detect speech stop (in milliseconds).
177+
178+
```yaml
179+
Type: UInt16
180+
Required: False
181+
Position: Named
182+
```
183+
117184
### -TimestampGranularities
118185
The timestamp granularities to populate for this transcription. Any of these options: `word`, or `segment`. The default is `segment`.
119186

Docs/Request-AudioTranslation.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ Translates audio into English.
1414

1515
```
1616
Request-AudioTranslation
17-
[[-File] <String>]
17+
[-File] <String>
1818
[-Model <String>]
1919
[-Prompt <String>]
2020
[-ResponseFormat <String>]
@@ -49,8 +49,8 @@ The audio file to translate, in one of these formats: `flac`, `mp3`, `mp4`, `mpe
4949

5050
```yaml
5151
Type: String
52-
Required: False
53-
Position: 1
52+
Required: True
53+
Position: 0
5454
Accept pipeline input: True (ByValue)
5555
```
5656

0 commit comments

Comments
 (0)