|
| 1 | +--- |
| 2 | +title: Data Flow |
| 3 | +subtitle: Understand how data flows through Vapi when using custom storage and custom models |
| 4 | +slug: security-and-privacy/data-flow |
| 5 | +--- |
| 6 | + |
| 7 | +## Overview |
| 8 | + |
| 9 | +When using Vapi, data flows through multiple components during a voice conversation. Understanding this flow is essential for security-conscious organizations, especially when integrating custom bucket storage or custom LLM providers. |
| 10 | + |
| 11 | +**This guide explains:** |
| 12 | +- What data passes through Vapi during calls |
| 13 | +- What data is stored on Vapi's infrastructure vs your own |
| 14 | +- How custom configurations change the data flow |
| 15 | + |
| 16 | +## Default Data Flow |
| 17 | + |
| 18 | +In the default configuration, Vapi handles all components of the voice pipeline and stores call artifacts on Vapi's infrastructure. |
| 19 | + |
| 20 | +```mermaid |
| 21 | +flowchart LR |
| 22 | + subgraph User["User"] |
| 23 | + A[Voice Input] |
| 24 | + end |
| 25 | + |
| 26 | + subgraph Vapi["Vapi Infrastructure"] |
| 27 | + B[Speech-to-Text] |
| 28 | + C[Orchestration Layer] |
| 29 | + D[LLM Processing] |
| 30 | + E[Text-to-Speech] |
| 31 | + F[(Vapi Storage)] |
| 32 | + end |
| 33 | + |
| 34 | + subgraph Stored["Stored on Vapi"] |
| 35 | + G[Call Recordings] |
| 36 | + H[Transcriptions] |
| 37 | + I[Call Logs] |
| 38 | + J[Analytics] |
| 39 | + end |
| 40 | + |
| 41 | + A --> B --> C --> D --> E --> A |
| 42 | + C --> F |
| 43 | + F --> G & H & I & J |
| 44 | +``` |
| 45 | + |
| 46 | +**In default mode, Vapi stores:** |
| 47 | +- Call recordings (audio files) |
| 48 | +- Transcriptions (full conversation text) |
| 49 | +- Call logs and metadata |
| 50 | +- Analytics and structured outputs |
| 51 | + |
| 52 | +## Custom Bucket Storage Data Flow |
| 53 | + |
| 54 | +When you configure custom bucket storage (AWS S3, GCP Cloud Storage, or Cloudflare R2), call recordings are uploaded directly to your storage instead of Vapi's infrastructure. |
| 55 | + |
| 56 | +```mermaid |
| 57 | +flowchart LR |
| 58 | + subgraph User["User"] |
| 59 | + A[Voice Input] |
| 60 | + end |
| 61 | + |
| 62 | + subgraph Vapi["Vapi Infrastructure"] |
| 63 | + B[Speech-to-Text] |
| 64 | + C[Orchestration Layer] |
| 65 | + D[LLM Processing] |
| 66 | + E[Text-to-Speech] |
| 67 | + end |
| 68 | + |
| 69 | + subgraph Customer["Your Infrastructure"] |
| 70 | + F[(Your Cloud Bucket)] |
| 71 | + G[Call Recordings] |
| 72 | + end |
| 73 | + |
| 74 | + subgraph VapiStored["Stored on Vapi"] |
| 75 | + H[Call Metadata] |
| 76 | + I[Analytics] |
| 77 | + end |
| 78 | + |
| 79 | + A --> B --> C --> D --> E --> A |
| 80 | + C -.->|Recording Upload| F |
| 81 | + F --> G |
| 82 | + C --> H & I |
| 83 | +``` |
| 84 | + |
| 85 | +**With custom bucket storage:** |
| 86 | +- **Your infrastructure stores:** Call recordings |
| 87 | +- **Vapi stores:** Call metadata, analytics, transcriptions (unless HIPAA mode enabled) |
| 88 | +- **Passes through Vapi:** Audio streams (not persisted after processing) |
| 89 | + |
| 90 | +<Note> |
| 91 | +Configure custom bucket storage in **Provider Credentials > Cloud Providers** in the Vapi Dashboard. See [AWS S3](/providers/cloud/s3), [GCP Cloud Storage](/providers/cloud/gcp), or [Cloudflare R2](/providers/cloud/cloudflare) for setup instructions. |
| 92 | +</Note> |
| 93 | + |
| 94 | +## Custom LLM Data Flow |
| 95 | + |
| 96 | +When you bring your own LLM server, conversation context is sent to your infrastructure for processing instead of Vapi's default LLM providers. |
| 97 | + |
| 98 | +```mermaid |
| 99 | +flowchart LR |
| 100 | + subgraph User["User"] |
| 101 | + A[Voice Input] |
| 102 | + end |
| 103 | + |
| 104 | + subgraph Vapi["Vapi Infrastructure"] |
| 105 | + B[Speech-to-Text] |
| 106 | + C[Orchestration Layer] |
| 107 | + E[Text-to-Speech] |
| 108 | + F[(Vapi Storage)] |
| 109 | + end |
| 110 | + |
| 111 | + subgraph Customer["Your Infrastructure"] |
| 112 | + D[Your LLM Server] |
| 113 | + end |
| 114 | + |
| 115 | + subgraph Stored["Stored on Vapi"] |
| 116 | + G[Call Recordings] |
| 117 | + H[Transcriptions] |
| 118 | + I[Call Logs] |
| 119 | + end |
| 120 | + |
| 121 | + A --> B --> C |
| 122 | + C <-->|Conversation Context| D |
| 123 | + C --> E --> A |
| 124 | + C --> F --> G & H & I |
| 125 | +``` |
| 126 | + |
| 127 | +**With custom LLM:** |
| 128 | +- **Your infrastructure processes:** All LLM requests (prompts, conversation history, tool calls) |
| 129 | +- **Vapi stores:** Call recordings, transcriptions, logs (unless HIPAA mode enabled) |
| 130 | +- **Passes through Vapi:** Transcribed text sent to your LLM, LLM responses |
| 131 | + |
| 132 | +<Note> |
| 133 | +See [Bring Your Own Server](/customization/custom-llm/using-your-server) for custom LLM setup instructions. |
| 134 | +</Note> |
| 135 | + |
| 136 | +## Combined Configuration: Custom Storage + Custom LLM |
| 137 | + |
| 138 | +For maximum data control, combine custom bucket storage with a custom LLM. This configuration minimizes data stored on Vapi's infrastructure. |
| 139 | + |
| 140 | +```mermaid |
| 141 | +flowchart LR |
| 142 | + subgraph User["User"] |
| 143 | + A[Voice Input] |
| 144 | + end |
| 145 | + |
| 146 | + subgraph Vapi["Vapi Infrastructure"] |
| 147 | + B[Speech-to-Text] |
| 148 | + C[Orchestration Layer] |
| 149 | + E[Text-to-Speech] |
| 150 | + end |
| 151 | + |
| 152 | + subgraph Customer["Your Infrastructure"] |
| 153 | + D[Your LLM Server] |
| 154 | + F[(Your Cloud Bucket)] |
| 155 | + end |
| 156 | + |
| 157 | + subgraph VapiStored["Stored on Vapi"] |
| 158 | + G[Call Metadata Only] |
| 159 | + end |
| 160 | + |
| 161 | + A --> B --> C |
| 162 | + C <-->|Conversation Context| D |
| 163 | + C --> E --> A |
| 164 | + C -.->|Recording Upload| F |
| 165 | + C --> G |
| 166 | +``` |
| 167 | + |
| 168 | +**With both custom storage and custom LLM:** |
| 169 | +- **Your infrastructure handles:** LLM processing, call recording storage |
| 170 | +- **Vapi stores:** Minimal call metadata (call ID, timestamps, duration) |
| 171 | +- **Passes through Vapi:** Audio streams, transcribed text (ephemeral) |
| 172 | + |
| 173 | +## Data Storage Summary |
| 174 | + |
| 175 | +The following table summarizes where data is stored based on your configuration: |
| 176 | + |
| 177 | +| Data Type | Default | Custom Storage | Custom LLM | Custom Storage + LLM | HIPAA Mode | |
| 178 | +|-----------|---------|----------------|------------|---------------------|------------| |
| 179 | +| Call recordings | Vapi | **Your bucket** | Vapi | **Your bucket** | Not stored | |
| 180 | +| Transcriptions | Vapi | Vapi | Vapi | Vapi | Not stored | |
| 181 | +| LLM prompts/responses | Vapi's LLM provider | Vapi's LLM provider | **Your server** | **Your server** | Your server (compliant providers) | |
| 182 | +| Call logs | Vapi | Vapi | Vapi | Vapi | Not stored | |
| 183 | +| Call metadata | Vapi | Vapi | Vapi | Vapi | Vapi | |
| 184 | +| Structured outputs | Vapi | Vapi | Vapi | Vapi | Not stored (unless explicitly enabled) | |
| 185 | + |
| 186 | +<Note> |
| 187 | +**HIPAA Mode** (`hipaaEnabled: true`) ensures no call logs, recordings, or transcriptions are stored on Vapi. See [HIPAA Compliance](/security-and-privacy/hipaa) for details. |
| 188 | +</Note> |
| 189 | + |
| 190 | +## What Data Passes Through Vapi |
| 191 | + |
| 192 | +Even with custom configurations, certain data passes through Vapi's orchestration layer during calls: |
| 193 | + |
| 194 | +| Data | Purpose | Retention | |
| 195 | +|------|---------|-----------| |
| 196 | +| Audio streams | Real-time processing by STT/TTS | Ephemeral (not stored) | |
| 197 | +| Transcribed text | Sent to LLM for response generation | Stored in logs (unless HIPAA mode) | |
| 198 | +| LLM responses | Converted to speech | Stored in logs (unless HIPAA mode) | |
| 199 | +| Call signaling | SIP/WebRTC connection management | Metadata only | |
| 200 | + |
| 201 | +## Recommendations by Use Case |
| 202 | + |
| 203 | +<AccordionGroup> |
| 204 | + <Accordion title="I need to comply with data residency requirements"> |
| 205 | + Use **custom bucket storage** with a bucket in your required region. This ensures call recordings stay within your geographic boundaries. For LLM processing, use a **custom LLM** hosted in the required region or use provider keys for an LLM with regional endpoints. |
| 206 | + </Accordion> |
| 207 | + |
| 208 | + <Accordion title="I want to minimize data stored on third-party infrastructure"> |
| 209 | + Enable **custom bucket storage** + **custom LLM** + **HIPAA mode**. This configuration: |
| 210 | + - Routes recordings to your storage |
| 211 | + - Routes LLM processing to your server |
| 212 | + - Prevents Vapi from storing logs, transcriptions, or recordings |
| 213 | + </Accordion> |
| 214 | + |
| 215 | + <Accordion title="I need HIPAA compliance"> |
| 216 | + Enable `hipaaEnabled: true` at the organization or assistant level. Use only [HIPAA-compliant providers](/security-and-privacy/hipaa#hipaa-compliant-providers) for STT, LLM, and TTS. Consider custom storage for additional control over PHI. |
| 217 | + </Accordion> |
| 218 | + |
| 219 | + <Accordion title="I want call recordings but with my own storage"> |
| 220 | + Configure **custom bucket storage** only. Vapi will continue to handle transcription, LLM processing, and analytics while uploading recordings to your bucket. |
| 221 | + </Accordion> |
| 222 | +</AccordionGroup> |
| 223 | + |
| 224 | +## Next Steps |
| 225 | + |
| 226 | +- **[AWS S3 Setup](/providers/cloud/s3)** - Configure S3 bucket storage |
| 227 | +- **[GCP Cloud Storage Setup](/providers/cloud/gcp)** - Configure GCP bucket storage |
| 228 | +- **[Custom LLM Integration](/customization/custom-llm/using-your-server)** - Bring your own LLM server |
| 229 | +- **[HIPAA Compliance](/security-and-privacy/hipaa)** - Enable HIPAA mode for healthcare use cases |
| 230 | +- **[Provider Keys](/customization/provider-keys)** - Use your own API keys with Vapi's default providers |
0 commit comments