Fix smart endpointing provider docs

cursoragent · sahilsuman933 · cursoragent · commit d7fdc26a17ea · 2026-04-07T20:31:45.000Z
Co-authored-by: Sahil Suman &lt;sahilsuman933@users.noreply.github.com&gt;
diff --git a/fern/customization/speech-configuration.mdx b/fern/customization/speech-configuration.mdx
@@ -39,27 +39,18 @@ This plan defines the parameters for when the assistant begins speaking after th
   - **End-of-turn prediction** - predicting when the current speaker is likely to finish their turn.
   - **Backchannel prediction** - detecting moments where a listener may provide short verbal acknowledgments like "uh-huh", "yeah", etc. to show engagement, without intending to take over the speaking turn. This is better handled by the assistant's stopSpeakingPlan.
 
-  We offer different providers that can be audio-based, text-based, or audio-text based:
+  Supported `smartEndpointingPlan.provider` values are:
 
-  **Audio-based providers:**
+  - **LiveKit (`livekit`)**: Recommended for English conversations. LiveKit can be fine-tuned using the `waitFunction` parameter to adjust response timing based on the probability that the user is still speaking.
+  - **Vapi (`vapi`)**: Recommended for non-English conversations or as an alternative when LiveKit isn't suitable.
+  - **Custom endpointing model (`custom-endpointing-model`)**: Use your own endpointing service by setting `smartEndpointingPlan.server.url`.
 
-  - **Krisp**: Audio-based model that analyzes prosodic and acoustic features such as changes in intonation, pitch, and rhythm to detect when users finish speaking. Since it's audio-based, it always notifies when the user is done speaking, even for brief acknowledgments. Vapi offers configurable acknowledgement words and a well-configured stop speaking plan to handle this properly.
+  If you want smart endpointing off, omit `smartEndpointingPlan`. In that case, the system uses transcriber end-of-turn detection when available.
 
-    Configure Krisp with a threshold between 0 and 1 (default 0.5), where 1 means the user definitely stopped speaking and 0 means they're still speaking. Use lower values for snappier conversations and higher values for more conservative detection.
+  **Transcriber-native end-of-turn (not `smartEndpointingPlan.provider` values):**
 
-    When interacting with an AI agent, users may genuinely want to interrupt to ask a question or shift the conversation, or they might simply be using backchannel cues like "right" or "okay" to signal they're actively listening. The core challenge lies in distinguishing meaningful interruptions from casual acknowledgments. Since the audio-based model signals end-of-turn after each word, configure the stop speaking plan with the right number of words to interrupt, interruption settings, and acknowledgement phrases to handle backchanneling properly.
-
-  **Audio-text based providers:**
-
-  - **Deepgram Flux**: Deepgram's latest transcriber model with built-in conversational speech recognition. Flux combines high-quality speech-to-text with native turn detection, while delivering ultra-low latency and Nova-3 level accuracy.
-
-  - **Assembly**: Transcriber that also reports end-of-turn detection. To use Assembly, choose it as your transcriber without setting a separate smart endpointing plan. As transcripts arrive, we consider the `end_of_turn` flag that Assembly sends to mark the end-of-turn, stream to the LLM, and generate a response.
-
-  **Text-based providers:**
-
-  - **Off**: Disabled by default. When smart endpointing is set to "Off", the system will automatically use the transcriber's end-of-turn detection if available. If no transcriber EOT detection is available, the system defaults to LiveKit if the language is set to English or to Vapi's standard endpointing mode.
-  - **LiveKit**: Recommended for English conversations as it provides the most sophisticated solution for detecting natural speech patterns and pauses. LiveKit can be fine-tuned using the `waitFunction` parameter to adjust response timing based on the probability that the user is still speaking.
-  - **Vapi**: Recommended for non-English conversations or as an alternative when LiveKit isn't suitable
+  - **Deepgram Flux**: Deepgram's transcriber with built-in conversational turn detection.
+  - **Assembly**: Assembly transcriber that returns `end_of_turn` signals. Configure this on the transcriber without setting `smartEndpointingPlan`.
 
   ![LiveKit Smart Endpointing Configuration](../static/images/advanced-tab/livekit-smart-endpointing.png)
 
diff --git a/fern/customization/voice-pipeline-configuration.mdx b/fern/customization/voice-pipeline-configuration.mdx
@@ -162,7 +162,7 @@ This plan is only used if `smartEndpointingPlan` is not set and transcriber does
 
 ### Smart endpointing
 
-Uses AI models to analyze speech patterns, context, and audio cues to predict when users have finished speaking. Only available for English conversations.
+Uses supported endpointing providers to predict when users have finished speaking.
 
 **Important:** If your transcriber has built-in end-of-turn detection (like Deepgram Flux or Assembly) and you don't configure a smart endpointing plan, the system will automatically use the transcriber's EOT detection instead of smart endpointing.
 
@@ -181,27 +181,19 @@ Uses AI models to analyze speech patterns, context, and audio cues to predict wh
     ```
   </Tab>
   <Tab title="Providers">
-    **Text-based providers:**
+    Supported values for `smartEndpointingPlan.provider`:
     - **livekit**: Advanced model trained on conversation data (English only)
-    - **vapi**: VAPI-trained model (non-English conversations or LiveKit alternative)
-    
-    **Audio-based providers:**
-    - **krisp**: Audio-based model analyzing prosodic features (intonation, pitch, rhythm)
-    
-    **Audio-text based providers:**
-    - **deepgram-flux**: Deepgram's latest transcriber model with built-in conversational speech recognition. (English only)
-    - **assembly**: Transcriber with built-in end-of-turn detection (English only)
+    - **vapi**: Vapi endpointing model for multilingual conversations
+    - **custom-endpointing-model**: Send endpointing requests to your own model using `smartEndpointingPlan.server.url`
 
   </Tab>
 </Tabs>
 
 **When to use:**
 
-- **Deepgram Flux**: English conversations using Deepgram as a transcriber. 
-- **Assembly**: Best used when Assembly is already your transcriber provider for English conversations with integrated end-of-turn detection
 - **LiveKit**: English conversations where Deepgram is not the transcriber of choice.
 - **Vapi**: Non-English conversations with default stop speaking plan settings
-- **Krisp**: Non-English conversations with a robustly configured stop speaking plan
+- **Custom endpointing model**: Bring your own endpointing logic for specialized domains
 
 ### Deepgram Flux configuration
 
@@ -305,31 +297,26 @@ The system continuously analyzes the latest user message and applies the first m
 - Scenarios requiring predictable, rule-based endpointing behavior
 - Fallback option when other smart endpointing providers aren't suitable
 
-### Krisp threshold configuration
+### Custom endpointing model configuration
 
-Krisp's audio-base model returns a probability between 0 and 1, where 1 means the user definitely stopped speaking and 0 means they're still speaking.
-
-**Threshold settings:**
-
-- **0.0-0.3:** Very aggressive detection - responds quickly but may interrupt users mid-sentence
-- **0.4-0.6:** Balanced detection (default: 0.5) - good balance between responsiveness and accuracy
-- **0.7-1.0:** Conservative detection - waits longer to ensure users have finished speaking
+Use `custom-endpointing-model` when you want full control over endpointing decisions.
 
 **Configuration example:**
 
 ```json
 {
   "startSpeakingPlan": {
     "smartEndpointingPlan": {
-      "provider": "krisp",
-      "threshold": 0.5
+      "provider": "custom-endpointing-model",
+      "server": {
+        "url": "https://your-endpointing-service.example.com/endpointing"
+      }
     }
   }
 }
 ```
 
-**Important considerations:**
-Since Krisp is audio-based, it always notifies when the user is done speaking, even for brief acknowledgments. Configure the stop speaking plan with appropriate `acknowledgementPhrases` and `numWords` settings to handle backchanneling properly.
+When configured, Vapi sends `call.endpointing.request` events to your server URL and expects a timeout-based response.
 
 ### Assembly turn detection
 
@@ -613,15 +600,17 @@ User Interrupts → Assistant Audio Stopped → backoffSeconds Blocks All Output
 
 **Optimized for:** Text-based endpointing with longer timeouts for different speech patterns and international support.
 
-### Audio-based endpointing (Krisp example)
+### Custom endpointing model example
 
 ```json
 {
   "startSpeakingPlan": {
     "waitSeconds": 0.4,
     "smartEndpointingPlan": {
-      "provider": "krisp",
-      "threshold": 0.5
+      "provider": "custom-endpointing-model",
+      "server": {
+        "url": "https://your-endpointing-service.example.com/endpointing"
+      }
     }
   },
   "stopSpeakingPlan": {
@@ -640,7 +629,7 @@ User Interrupts → Assistant Audio Stopped → backoffSeconds Blocks All Output
 }
 ```
 
-**Optimized for:** Non-English conversations with robust backchanneling configuration to handle audio-based detection limitations.
+**Optimized for:** Teams that need domain-specific endpointing behavior and want to run their own endpointing model.
 
 ### Audio-text based endpointing (Assembly example)