Skip to content

Commit c769daa

Browse files
docs: add data flow documentation for custom storage and models
Adds comprehensive documentation explaining data flow when using custom bucket storage and custom LLM configurations in Vapi. Includes: - Mermaid diagrams for default, custom storage, custom LLM, and combined flows - Data storage summary table - Use case recommendations - Cross-references to related documentation Addresses customer inquiries about what data is passed and stored in Vapi vs their own systems. VAP-11220 Co-Authored-By: Claude <noreply@anthropic.com>
1 parent fd4fe10 commit c769daa

1 file changed

Lines changed: 230 additions & 0 deletions

File tree

Lines changed: 230 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,230 @@
1+
---
2+
title: Data Flow
3+
subtitle: Understand how data flows through Vapi when using custom storage and custom models
4+
slug: security-and-privacy/data-flow
5+
---
6+
7+
## Overview
8+
9+
When using Vapi, data flows through multiple components during a voice conversation. Understanding this flow is essential for security-conscious organizations, especially when integrating custom bucket storage or custom LLM providers.
10+
11+
**This guide explains:**
12+
- What data passes through Vapi during calls
13+
- What data is stored on Vapi's infrastructure vs your own
14+
- How custom configurations change the data flow
15+
16+
## Default Data Flow
17+
18+
In the default configuration, Vapi handles all components of the voice pipeline and stores call artifacts on Vapi's infrastructure.
19+
20+
```mermaid
21+
flowchart LR
22+
subgraph User["User"]
23+
A[Voice Input]
24+
end
25+
26+
subgraph Vapi["Vapi Infrastructure"]
27+
B[Speech-to-Text]
28+
C[Orchestration Layer]
29+
D[LLM Processing]
30+
E[Text-to-Speech]
31+
F[(Vapi Storage)]
32+
end
33+
34+
subgraph Stored["Stored on Vapi"]
35+
G[Call Recordings]
36+
H[Transcriptions]
37+
I[Call Logs]
38+
J[Analytics]
39+
end
40+
41+
A --> B --> C --> D --> E --> A
42+
C --> F
43+
F --> G & H & I & J
44+
```
45+
46+
**In default mode, Vapi stores:**
47+
- Call recordings (audio files)
48+
- Transcriptions (full conversation text)
49+
- Call logs and metadata
50+
- Analytics and structured outputs
51+
52+
## Custom Bucket Storage Data Flow
53+
54+
When you configure custom bucket storage (AWS S3, GCP Cloud Storage, or Cloudflare R2), call recordings are uploaded directly to your storage instead of Vapi's infrastructure.
55+
56+
```mermaid
57+
flowchart LR
58+
subgraph User["User"]
59+
A[Voice Input]
60+
end
61+
62+
subgraph Vapi["Vapi Infrastructure"]
63+
B[Speech-to-Text]
64+
C[Orchestration Layer]
65+
D[LLM Processing]
66+
E[Text-to-Speech]
67+
end
68+
69+
subgraph Customer["Your Infrastructure"]
70+
F[(Your Cloud Bucket)]
71+
G[Call Recordings]
72+
end
73+
74+
subgraph VapiStored["Stored on Vapi"]
75+
H[Call Metadata]
76+
I[Analytics]
77+
end
78+
79+
A --> B --> C --> D --> E --> A
80+
C -.->|Recording Upload| F
81+
F --> G
82+
C --> H & I
83+
```
84+
85+
**With custom bucket storage:**
86+
- **Your infrastructure stores:** Call recordings
87+
- **Vapi stores:** Call metadata, analytics, transcriptions (unless HIPAA mode enabled)
88+
- **Passes through Vapi:** Audio streams (not persisted after processing)
89+
90+
<Note>
91+
Configure custom bucket storage in **Provider Credentials > Cloud Providers** in the Vapi Dashboard. See [AWS S3](/providers/cloud/s3), [GCP Cloud Storage](/providers/cloud/gcp), or [Cloudflare R2](/providers/cloud/cloudflare) for setup instructions.
92+
</Note>
93+
94+
## Custom LLM Data Flow
95+
96+
When you bring your own LLM server, conversation context is sent to your infrastructure for processing instead of Vapi's default LLM providers.
97+
98+
```mermaid
99+
flowchart LR
100+
subgraph User["User"]
101+
A[Voice Input]
102+
end
103+
104+
subgraph Vapi["Vapi Infrastructure"]
105+
B[Speech-to-Text]
106+
C[Orchestration Layer]
107+
E[Text-to-Speech]
108+
F[(Vapi Storage)]
109+
end
110+
111+
subgraph Customer["Your Infrastructure"]
112+
D[Your LLM Server]
113+
end
114+
115+
subgraph Stored["Stored on Vapi"]
116+
G[Call Recordings]
117+
H[Transcriptions]
118+
I[Call Logs]
119+
end
120+
121+
A --> B --> C
122+
C <-->|Conversation Context| D
123+
C --> E --> A
124+
C --> F --> G & H & I
125+
```
126+
127+
**With custom LLM:**
128+
- **Your infrastructure processes:** All LLM requests (prompts, conversation history, tool calls)
129+
- **Vapi stores:** Call recordings, transcriptions, logs (unless HIPAA mode enabled)
130+
- **Passes through Vapi:** Transcribed text sent to your LLM, LLM responses
131+
132+
<Note>
133+
See [Bring Your Own Server](/customization/custom-llm/using-your-server) for custom LLM setup instructions.
134+
</Note>
135+
136+
## Combined Configuration: Custom Storage + Custom LLM
137+
138+
For maximum data control, combine custom bucket storage with a custom LLM. This configuration minimizes data stored on Vapi's infrastructure.
139+
140+
```mermaid
141+
flowchart LR
142+
subgraph User["User"]
143+
A[Voice Input]
144+
end
145+
146+
subgraph Vapi["Vapi Infrastructure"]
147+
B[Speech-to-Text]
148+
C[Orchestration Layer]
149+
E[Text-to-Speech]
150+
end
151+
152+
subgraph Customer["Your Infrastructure"]
153+
D[Your LLM Server]
154+
F[(Your Cloud Bucket)]
155+
end
156+
157+
subgraph VapiStored["Stored on Vapi"]
158+
G[Call Metadata Only]
159+
end
160+
161+
A --> B --> C
162+
C <-->|Conversation Context| D
163+
C --> E --> A
164+
C -.->|Recording Upload| F
165+
C --> G
166+
```
167+
168+
**With both custom storage and custom LLM:**
169+
- **Your infrastructure handles:** LLM processing, call recording storage
170+
- **Vapi stores:** Minimal call metadata (call ID, timestamps, duration)
171+
- **Passes through Vapi:** Audio streams, transcribed text (ephemeral)
172+
173+
## Data Storage Summary
174+
175+
The following table summarizes where data is stored based on your configuration:
176+
177+
| Data Type | Default | Custom Storage | Custom LLM | Custom Storage + LLM | HIPAA Mode |
178+
|-----------|---------|----------------|------------|---------------------|------------|
179+
| Call recordings | Vapi | **Your bucket** | Vapi | **Your bucket** | Not stored |
180+
| Transcriptions | Vapi | Vapi | Vapi | Vapi | Not stored |
181+
| LLM prompts/responses | Vapi's LLM provider | Vapi's LLM provider | **Your server** | **Your server** | Your server (compliant providers) |
182+
| Call logs | Vapi | Vapi | Vapi | Vapi | Not stored |
183+
| Call metadata | Vapi | Vapi | Vapi | Vapi | Vapi |
184+
| Structured outputs | Vapi | Vapi | Vapi | Vapi | Not stored (unless explicitly enabled) |
185+
186+
<Note>
187+
**HIPAA Mode** (`hipaaEnabled: true`) ensures no call logs, recordings, or transcriptions are stored on Vapi. See [HIPAA Compliance](/security-and-privacy/hipaa) for details.
188+
</Note>
189+
190+
## What Data Passes Through Vapi
191+
192+
Even with custom configurations, certain data passes through Vapi's orchestration layer during calls:
193+
194+
| Data | Purpose | Retention |
195+
|------|---------|-----------|
196+
| Audio streams | Real-time processing by STT/TTS | Ephemeral (not stored) |
197+
| Transcribed text | Sent to LLM for response generation | Stored in logs (unless HIPAA mode) |
198+
| LLM responses | Converted to speech | Stored in logs (unless HIPAA mode) |
199+
| Call signaling | SIP/WebRTC connection management | Metadata only |
200+
201+
## Recommendations by Use Case
202+
203+
<AccordionGroup>
204+
<Accordion title="I need to comply with data residency requirements">
205+
Use **custom bucket storage** with a bucket in your required region. This ensures call recordings stay within your geographic boundaries. For LLM processing, use a **custom LLM** hosted in the required region or use provider keys for an LLM with regional endpoints.
206+
</Accordion>
207+
208+
<Accordion title="I want to minimize data stored on third-party infrastructure">
209+
Enable **custom bucket storage** + **custom LLM** + **HIPAA mode**. This configuration:
210+
- Routes recordings to your storage
211+
- Routes LLM processing to your server
212+
- Prevents Vapi from storing logs, transcriptions, or recordings
213+
</Accordion>
214+
215+
<Accordion title="I need HIPAA compliance">
216+
Enable `hipaaEnabled: true` at the organization or assistant level. Use only [HIPAA-compliant providers](/security-and-privacy/hipaa#hipaa-compliant-providers) for STT, LLM, and TTS. Consider custom storage for additional control over PHI.
217+
</Accordion>
218+
219+
<Accordion title="I want call recordings but with my own storage">
220+
Configure **custom bucket storage** only. Vapi will continue to handle transcription, LLM processing, and analytics while uploading recordings to your bucket.
221+
</Accordion>
222+
</AccordionGroup>
223+
224+
## Next Steps
225+
226+
- **[AWS S3 Setup](/providers/cloud/s3)** - Configure S3 bucket storage
227+
- **[GCP Cloud Storage Setup](/providers/cloud/gcp)** - Configure GCP bucket storage
228+
- **[Custom LLM Integration](/customization/custom-llm/using-your-server)** - Bring your own LLM server
229+
- **[HIPAA Compliance](/security-and-privacy/hipaa)** - Enable HIPAA mode for healthcare use cases
230+
- **[Provider Keys](/customization/provider-keys)** - Use your own API keys with Vapi's default providers

0 commit comments

Comments
 (0)