Skip to content

Commit 214099e

Browse files
docs(feat): add monitoring and optimization workflow guides
Add two guides covering the "monitor & improve" stages (skeleton format): **Monitoring & Operating guide:** - Reframed title from "Monitoring" to "Monitoring & Operating" - Operating voice AI systems introduction (real-time performance, cost, quality) - Tools at a glance (Boards, Insights API, Analytics API, Langfuse, Webhook-to-External) - Placeholder sections for tool details, alerting strategies, best practices - Focus on operational reliability and continuous visibility **Optimization workflows guide:** - Optimization as continuous improvement loop (not a dedicated tool) - 7-step workflow (Detect → Extract → Hypothesize → Change → Test → Deploy → Verify) - Optimization mindset and why it matters - Placeholder sections for detailed steps, common scenarios, best practices - Cross-functional workflow using tools from all previous stages Both pages use skeleton format with complete intros and VAPI validation questions, awaiting tool clarification and detailed content development in iteration 2.
1 parent 8d6dc9d commit 214099e

2 files changed

Lines changed: 323 additions & 0 deletions

File tree

fern/observability/monitoring.mdx

Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
---
2+
title: Monitoring & Operating
3+
subtitle: Visualize trends, track operational health, and ensure production reliability
4+
slug: observability/monitoring
5+
---
6+
7+
## What is monitoring and operating?
8+
9+
**Monitoring & Operating** means running your voice AI system in production with continuous visibility into its health and performance. This stage answers critical operational questions:
10+
11+
- How many calls are happening right now?
12+
- What's my average call cost this week?
13+
- Is my success rate dropping?
14+
- Are any assistants experiencing unusual error rates?
15+
- When should I be alerted about problems?
16+
17+
**Operating a voice AI system** requires more than traditional software monitoring. Voice AI systems have unique operational characteristics:
18+
19+
- **Real-time performance matters** — Latency, interruption handling, and voice quality directly impact user experience
20+
- **Cost scales with usage** — Every call has LLM, TTS, and STT costs that must be tracked
21+
- **Quality is subjective** — Success isn't just "200 OK" - it's whether the conversation achieved its goal
22+
- **Failures are multi-layered** — Issues can occur in the LLM, voice pipeline, tool execution, or external integrations
23+
24+
**The goal**: Catch problems early (before customers complain), understand operational patterns, and maintain production reliability.
25+
26+
---
27+
28+
## Monitoring & Operating tools at a glance
29+
30+
| Tool | What it does | Best for |
31+
|------|--------------|----------|
32+
| **Boards** | Drag-and-drop visual dashboards with charts, metrics, and global filters. Queries scalar Structured Output fields. | Real-time operational visibility, team dashboards, custom reporting |
33+
| **Insights API** | [TBD: Programmatic querying and alerting capabilities?] | [TBD: Automated alerts, custom monitoring logic?] |
34+
| **Analytics API** | [TBD: Aggregated operational metrics?] | [TBD: Cost tracking, performance monitoring?] |
35+
| **Langfuse Integration** | Real-time observability platform integration for call monitoring and tracing | End-to-end observability, LLM performance tracking, distributed tracing |
36+
| **Webhook-to-External** | Export call data to third-party monitoring platforms (Datadog, Braintrust, Grafana, custom dashboards) | Enterprise monitoring stacks, unified observability across systems, custom alerting |
37+
38+
<span className="vapi-validation">Confirm this list of monitoring tools is complete and accurate. Need clarification on: What are the key capabilities and use cases for Insights API vs Analytics API? How do they differ? When should users choose one over the other? What monitoring capabilities does Langfuse provide beyond basic call data? Are there other built-in or recommended monitoring integrations? What's the roadmap for built-in alerting capabilities?</span>
39+
40+
---
41+
42+
## Boards
43+
44+
**[Placeholder - Full detail section]**
45+
46+
**[Build your first dashboard in Boards quickstart](/observability/boards-quickstart)**
47+
48+
---
49+
50+
## Analytics API
51+
52+
**[Placeholder - Full detail section]**
53+
54+
<span className="internal-note">What's the difference between Analytics API and Insights API? What are Analytics API's key capabilities? When should users choose Analytics API vs Insights API vs Boards?</span>
55+
56+
---
57+
58+
## Insights API
59+
60+
**[Placeholder - Full detail section]**
61+
62+
<Warning>
63+
**Insights API is currently undocumented**. If you need flexible querying or programmatic alerting, contact Vapi support for guidance.
64+
</Warning>
65+
66+
<span className="internal-note">Should Insights API be formally documented? What's the relationship between Insights API and Analytics API? Is Insights API the primary alerting mechanism, or are built-in alerts planned?</span>
67+
68+
---
69+
70+
## Langfuse Integration
71+
72+
**[Placeholder - Full detail section]**
73+
74+
<span className="vapi-validation">What are Langfuse's key capabilities for Vapi users? Does it provide real-time alerting? What metrics/traces does it capture? Are there setup requirements or limitations?</span>
75+
76+
---
77+
78+
## Webhook-to-External Monitoring
79+
80+
**[Placeholder - Full detail section]**
81+
82+
<span className="vapi-validation">What are recommended third-party monitoring platforms for Vapi (Datadog, Braintrust, etc.)? Are there integration guides or examples? What webhook events are most useful for monitoring?</span>
83+
84+
---
85+
86+
## Alerting Strategies
87+
88+
**[Placeholder - Full detail section]**
89+
90+
<span className="internal-note">Are built-in alerts on the roadmap?</span>
91+
92+
---
93+
94+
## Monitoring Best Practices
95+
96+
**[Placeholder - Full detail section]**
97+
98+
Topics to cover:
99+
- Define baseline metrics
100+
- Set alert thresholds (critical, warning, informational)
101+
- Monitor continuously, not reactively
102+
- Create role-specific dashboards
103+
104+
---
105+
106+
## What you'll learn in detailed guides
107+
108+
- [Boards quickstart](/observability/boards-quickstart) — Create custom dashboards in minutes
109+
- (Planned) Langfuse integration guide — Set up real-time observability
110+
- (Planned) Webhook monitoring guide — Export to external platforms
111+
- (Planned) Analytics API reference — Programmatic monitoring
112+
113+
---
114+
115+
## Key takeaway
116+
117+
**Monitor continuously**. Production issues caught early (via dashboards or alerts) are easier to fix than issues discovered through customer complaints.
118+
119+
Operating a voice AI system requires proactive monitoring. Set up visibility on day one of production launch.
120+
121+
---
122+
123+
## Next steps
124+
125+
<CardGroup cols={2}>
126+
<Card
127+
title="Boards quickstart"
128+
icon="chart-line"
129+
href="/observability/boards-quickstart"
130+
>
131+
Build your first monitoring dashboard
132+
</Card>
133+
134+
<Card
135+
title="Optimization workflows"
136+
icon="arrow-trend-up"
137+
href="/observability/optimization-workflows"
138+
>
139+
Next stage: Use monitoring data to improve
140+
</Card>
141+
142+
<Card
143+
title="Back to overview"
144+
icon="arrow-left"
145+
href="/observability/framework"
146+
>
147+
Return to observability framework
148+
</Card>
149+
</CardGroup>
Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
---
2+
title: Optimization workflows
3+
subtitle: Use observability data to continuously improve your assistant
4+
slug: observability/optimization-workflows
5+
---
6+
7+
## What is optimization?
8+
9+
**Optimization** is the continuous improvement loop: using observability data to refine prompts, improve tool calls, and enhance conversation flows.
10+
11+
Unlike the previous stages (INSTRUMENT, TEST, EXTRACT, MONITOR), **OPTIMIZE is not a dedicated tool or feature** — it's a workflow that combines tools from all previous stages to drive systematic improvement.
12+
13+
**The optimization mindset**: Voice AI quality improves through iteration, not perfection. The best teams:
14+
- Start with "good enough" (not perfect)
15+
- Deploy to production with instrumentation and monitoring
16+
- Use real-world data to identify improvement opportunities
17+
- Test changes before deploying
18+
- Track impact systematically
19+
20+
**Why optimization matters**: Without a systematic optimization workflow, teams either:
21+
- ❌ Over-engineer before launch (trying to predict every edge case)
22+
- ❌ React to problems ad-hoc (fixing symptoms, not root causes)
23+
- ❌ Stagnate after launch (no process for continuous improvement)
24+
25+
**The goal**: Establish a repeatable workflow that turns observability data into measurable improvements.
26+
27+
---
28+
29+
## Optimization workflow at a glance
30+
31+
| Stage | Tools Used | What you do |
32+
|-------|-----------|-------------|
33+
| **1. Detect patterns** | Boards, Insights API, Analytics API | Spot trends in monitoring dashboards (success rate dropping, cost increasing, etc.) |
34+
| **2. Extract details** | Webhooks, Structured Outputs, Transcripts | Pull call data to understand WHY the pattern exists |
35+
| **3. Form hypothesis** | Manual analysis | Identify root cause (e.g., "prompt doesn't handle edge case X") |
36+
| **4. Make changes** | Assistant configuration | Update prompts, tools, routing logic based on hypothesis |
37+
| **5. Test changes** | Evals, Simulations | Validate improvement before deploying to production |
38+
| **6. Deploy** | API, Dashboard | Push updated assistant to production |
39+
| **7. Verify** | Boards, Insights API | Track target metric to confirm improvement |
40+
41+
This is a **continuous cycle**, not a one-time activity:
42+
43+
```
44+
MONITOR → EXTRACT → Analyze → Revise → TEST → Deploy → MONITOR (repeat)
45+
```
46+
47+
<span className="vapi-validation">Confirm this optimization workflow accurately reflects how Vapi customers typically iterate on their assistants. Are there tools or stages we're missing? Should we emphasize certain steps more than others?</span>
48+
49+
---
50+
51+
## The optimization loop in detail
52+
53+
**[Placeholder - Full detail sections]**
54+
55+
### Step 1: Detect patterns from monitoring
56+
57+
<span className="internal-note">Placeholder for: How to use Boards/analytics to spot trends (success rate drops, cost spikes, etc.). Include example scenario.</span>
58+
59+
---
60+
61+
### Step 2: Extract detailed data
62+
63+
<span className="internal-note">Placeholder for: Methods for pulling call transcripts, structured outputs, tool call logs. Show how to filter/export data for analysis.</span>
64+
65+
---
66+
67+
### Step 3: Form a hypothesis
68+
69+
<span className="internal-note">Placeholder for: Common hypothesis patterns (prompt issues, tool description problems, routing logic, verbosity, etc.). Show example hypothesis formation process.</span>
70+
71+
---
72+
73+
### Step 4: Make targeted changes
74+
75+
<span className="internal-note">Placeholder for: How to revise prompts, update tool descriptions, refine conversation flows. Include before/after examples.</span>
76+
77+
---
78+
79+
### Step 5: Test before deploying
80+
81+
<span className="internal-note">Placeholder for: Creating Evals for specific failure cases, regression testing strategies. Show example test structure.</span>
82+
83+
---
84+
85+
### Step 6: Deploy
86+
87+
<span className="internal-note">Placeholder for: Deployment strategies (direct deploy, staged rollout, A/B testing). Include decision framework for choosing strategy.</span>
88+
89+
---
90+
91+
### Step 7: Verify improvement
92+
93+
<span className="internal-note">Placeholder for: Time windows for verification (immediate, 24h, 1 week), what to track, when to roll back.</span>
94+
95+
---
96+
97+
## Common optimization scenarios
98+
99+
**[Placeholder - Table of common patterns, root causes, and optimization actions]**
100+
101+
<span className="vapi-validation">What are the most common optimization scenarios Vapi customers encounter? What issues drive the most improvement iterations? Are there voice-specific optimization patterns we should highlight?</span>
102+
103+
---
104+
105+
## Optimization best practices
106+
107+
**[Placeholder - Full detail sections]**
108+
109+
Topics to cover:
110+
- Start with high-impact, low-effort changes
111+
- Track improvement over time (optimization log)
112+
- Don't optimize prematurely (wait for data)
113+
- Make one change at a time (clear cause-and-effect)
114+
- Maintain regression tests
115+
116+
<span className="internal-note">Should we include specific guidance on optimization cadence (weekly reviews, monthly deep dives, quarterly retrospectives)?</span>
117+
118+
---
119+
120+
## What you'll learn in detailed guides
121+
122+
**Optimization is cross-functional** — it references tools from all previous stages:
123+
- [Evals quickstart](/observability/evals-quickstart) — Test improvements before deploying
124+
- [Boards quickstart](/observability/boards-quickstart) — Track metrics over time
125+
- [Structured outputs quickstart](/assistants/structured-outputs-quickstart) — Extract failure data for analysis
126+
127+
(Planned) Optimization playbook — Common scenarios and solutions
128+
(Planned) Advanced optimization — A/B testing, staged rollouts, impact measurement
129+
130+
---
131+
132+
## Key takeaway
133+
134+
**Optimize continuously**. The best teams treat observability as a loop: instrument → test → deploy → monitor → identify improvements → repeat. Data-driven iteration beats guesswork.
135+
136+
Start your optimization practice on day one. Don't wait until you have problems — establish the workflow while things are working, so you're ready when issues arise.
137+
138+
---
139+
140+
## Next steps
141+
142+
<CardGroup cols={2}>
143+
<Card
144+
title="Boards quickstart"
145+
icon="chart-line"
146+
href="/observability/boards-quickstart"
147+
>
148+
Set up monitoring to detect patterns
149+
</Card>
150+
151+
<Card
152+
title="Evals quickstart"
153+
icon="clipboard-check"
154+
href="/observability/evals-quickstart"
155+
>
156+
Build tests to validate improvements
157+
</Card>
158+
159+
<Card
160+
title="Production readiness"
161+
icon="check-circle"
162+
href="/observability/production-readiness"
163+
>
164+
Validate you're ready to optimize in production
165+
</Card>
166+
167+
<Card
168+
title="Back to overview"
169+
icon="arrow-left"
170+
href="/observability/framework"
171+
>
172+
Return to observability framework
173+
</Card>
174+
</CardGroup>

0 commit comments

Comments
 (0)