| title | Transcriber fallback configuration |
|---|---|
| subtitle | Configure fallback transcribers that activate automatically if your primary transcriber fails. |
| slug | customization/transcriber-fallback-plan |
Transcriber fallback configuration ensures your calls continue even if your primary speech-to-text provider experiences issues. Your assistant will sequentially fallback to the transcribers you configure, in the exact order you specify.
Key benefits:
- Call continuity during provider outages
- Automatic failover with no user intervention required
- Provider diversity to protect against single points of failure
When a transcriber failure occurs, Vapi will:
- Detect the failure of the primary transcriber
- Switch to the first fallback transcriber in your plan
- Continue through your specified list if subsequent failures occur
- Terminate only if all transcribers in your plan have failed
Add the fallbackPlan property to your assistant's transcriber configuration, and specify the fallback transcribers within the transcribers property.
{
"transcriber": {
"provider": "deepgram",
"model": "nova-3",
"language": "en",
"fallbackPlan": {
"transcribers": [
{
"provider": "assembly-ai",
"speechModel": "universal-streaming-multilingual",
"language": "en"
},
{
"provider": "azure",
"language": "en-US"
}
]
}
}
}Each transcriber provider supports different configuration options. Expand the accordion below to see available settings for each provider.
- **model**: Model selection (`nova-3`, `nova-3-general`, `nova-3-medical`, `nova-2`, `flux-general-en`, etc.). - **language**: Language code for transcription. - **keywords**: Keywords with optional boost values for improved recognition (e.g., `["companyname", "productname:2"]`). - **keyterm**: Keyterm prompting for up to 90% keyword recall rate improvement. - **smartFormat** (boolean): Enable smart formatting for numbers and dates. - **eotThreshold** (0.5-0.9): End-of-turn confidence threshold. Only available with Flux models. - **eotTimeoutMs** (500-10000): Maximum time to wait after speech before finalizing turn. Only available with Flux models. Default is 5000ms. - **language**: Language code (`multi` for multilingual, `en` for English). - **speechModel**: Streaming speech model (`universal-streaming-english` or `universal-streaming-multilingual`). - **wordBoost**: Custom vocabulary array (up to 2500 characters total). - **keytermsPrompt**: Array of keyterms for improved recognition (up to 100 terms, 50 characters each). Costs additional $0.04/hour. - **endUtteranceSilenceThreshold**: Duration of silence in milliseconds to detect end of utterance. - **disablePartialTranscripts** (boolean): Set to `true` to disable partial transcripts. - **confidenceThreshold** (0-1): Minimum confidence threshold for accepting transcriptions. Default is 0.4. - **vadAssistedEndpointingEnabled** (boolean): Enable VAD-based endpoint detection. - **language**: Language code in BCP-47 format (e.g., `en-US`, `es-MX`, `fr-FR`). - **segmentationSilenceTimeoutMs** (100-5000): Duration of silence after which a phrase is finalized. Configure to adjust sensitivity to pauses. - **segmentationMaximumTimeMs** (20000-70000): Maximum duration a segment can reach before being cut off. - **segmentationStrategy**: Controls phrase boundary detection. Options: `Default`, `Time`, or `Semantic`. - **model**: Model selection (`fast`, `accurate`, or `solaria-1`). - **language**: Language code. - **confidenceThreshold** (0-1): Minimum confidence for transcription acceptance. Default is 0.4. - **endpointing** (0.01-10): Time in seconds to wait before considering speech ended. - **speechThreshold** (0-1): Speech detection sensitivity (0.0 to 1.0). - **prosody** (boolean): Enable prosody detection (laugh, giggle, music, etc.). - **audioEnhancer** (boolean): Pre-process audio for improved accuracy (increases latency). - **transcriptionHint**: Hint text to guide transcription. - **customVocabularyEnabled** (boolean): Enable custom vocabulary. - **customVocabularyConfig**: Custom vocabulary configuration with vocabulary array and default intensity. - **region**: Processing region (`us-west` or `eu-west`). - **receivePartialTranscripts** (boolean): Enable partial transcript delivery. - **model**: Model selection (currently only `default`). - **language**: Language code. - **operatingPoint**: Accuracy level. `standard` for faster turnaround, `enhanced` for highest accuracy. Default is `enhanced`. - **region**: Processing region (`eu` for Europe, `us` for United States). Default is `eu`. - **enableDiarization** (boolean): Enable speaker identification for multi-speaker conversations. - **maxDelayMs**: Maximum delay in milliseconds for partial transcripts. Balances latency and accuracy. - **model**: Gemini model selection. - **language**: Language selection (e.g., `Multilingual`, `English`, `Spanish`, `French`). - **model**: OpenAI Realtime STT model selection (required). - **language**: Language code for transcription. - **model**: Model selection (currently only `scribe_v1`). - **language**: ISO 639-1 language code. - **model**: Model selection (currently only `ink-whisper`). - **language**: ISO 639-1 language code.- Use different providers for fallbacks to protect against provider-wide outages.
- Consider language compatibility when selecting fallbacks—ensure all fallback transcribers support your required languages.
- Test your fallback configuration to ensure smooth transitions between transcribers.
- For HIPAA/PCI compliance, ensure all fallbacks are compliant providers (Deepgram or Azure).