The Twilio Synthetic Call Data Generator is built on a serverless architecture using Twilio Functions, OpenAI gpt-4o-mini, and Segment CDP to create realistic customer-agent conversations with intelligent pairing and comprehensive analytics.
┌─────────────────────────────────────────────────────────────────────┐
│ Orchestration Layer │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ src/main.js - Conference Orchestrator │ │
│ │ • Load personas (customers + agents) │ │
│ │ • Intelligent pairing (complexity-based matching) │ │
│ │ • Create Segment profiles │ │
│ │ • Initiate Twilio conferences │ │
│ └─────────────────────────────────────────────────────────────┘ │
└──────────────────────────┬──────────────────────────────────────────┘
│
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Customer │ │ Agent │ │ Segment │
│ Personas │ │ Personas │ │ CDP │
└──────────┘ └──────────┘ └──────────┘
│ │ │
└───────────────┴───────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ Twilio Serverless Layer │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │voice-handler │→ │ transcribe │→ │ respond │ │
│ │ Conference │ │ STT/Gather │ │ OpenAI LLM │ │
│ │ Routing │ │ TwiML │ │ Response │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────────┐ │ │
│ │ │ Twilio Sync │◄───────────┘ │
│ │ │ Conversation │ │
│ │ │ State │ │
│ │ └──────────────┘ │
│ ▼ │
│ ┌──────────────────────────────────────────┐ │
│ │ conference-status-webhook │ │
│ │ • call.completed │ │
│ │ • conference.completed │ │
│ │ • recording.completed │ │
│ └──────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────┐ │
│ │ transcription-webhook │ │
│ │ • Auto-transcription via Voice Intel │ │
│ │ • Sentiment analysis │ │
│ │ • Operator results │ │
│ └──────────────────────────────────────────┘ │
└──────────────────┬──────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ Data Pipeline │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Twilio │ │ Voice │ │ Segment │ │
│ │ Recording │ │ Intelligence │ │ Profiles │ │
│ │ Storage │ │ Transcripts │ │ + Events │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
sequenceDiagram
participant Orch as Orchestrator
participant Seg as Segment CDP
participant Twilio as Twilio API
participant VH as voice-handler
participant Trans as transcribe
participant Resp as respond
participant AI as OpenAI gpt-4o-mini
participant Sync as Twilio Sync
participant VI as Voice Intelligence
Orch->>Orch: Load personas & pair
Orch->>Seg: Create customer profiles
Orch->>Twilio: Create conference
Twilio->>VH: POST /voice-handler (agent)
VH->>Trans: Redirect to /transcribe (isFirstCall=true)
Trans->>Trans: Agent speaks greeting (no <Gather>)
Trans->>Trans: Redirect to /transcribe (isFirstCall=false)
Trans->>Trans: <Gather> for speech
Twilio->>VH: POST /voice-handler (customer)
VH->>Trans: Redirect to /transcribe (isFirstCall=false)
Trans->>Trans: <Gather> for speech
Note over Trans,Resp: Conversation Loop (10-20 turns)
Trans->>Resp: POST /respond with SpeechResult
Resp->>Sync: Get conversation history
Resp->>AI: chat.completions.create()
AI-->>Resp: AI response
Resp->>Sync: Store updated history
Resp->>Trans: <Say> response + redirect
Note over Twilio,VI: Post-Call Processing
Twilio->>VI: Auto-transcribe (Voice Intelligence)
VI->>Twilio: Transcript + Operator Results
Twilio->>Seg: Send events via Event Streams
Seg->>Seg: Update customer profiles
File: src/main.js
Responsibilities:
- Load customer and agent personas from JSON
- Intelligent pairing based on issue complexity and agent competence
- Create Segment customer profiles
- Initiate Twilio conferences with both participants
- Track conference lifecycle
Key Functions:
// Main workflow
loadCustomers() → loadAgents() → selectPair() →
createSegmentProfile() → createConference() → addParticipants()File: functions/voice-handler.js
Purpose: Initial webhook when participant joins conference
Flow:
- Receives webhook from Twilio when participant joins
- Extracts role (agent/customer) and persona from conference name
- Routes to
/transcribewith appropriateisFirstCallflag- Agent:
isFirstCall=true(speaks first) - Customer:
isFirstCall=false(listens first)
- Agent:
File: functions/transcribe.js
Purpose: Capture speech using <Gather> element
Turn-Taking Logic:
if (isFirstCall === 'true' && role === 'agent') {
// Agent first turn: speak greeting WITHOUT <Gather>
twiml.say({ voice: 'Polly.Joanna-Neural' }, introduction);
twiml.redirect('/transcribe?isFirstCall=false');
} else {
// Normal turn: listen for speech WITH <Gather>
gather = twiml.gather({
input: 'speech',
action: '/respond',
speechModel: 'experimental_conversations'
});
gather.say({ voice: 'Polly.Joanna-Neural' }, 'Listening...');
}Key Features:
- Speech-to-text using Twilio's speech recognition
- Automatic silence detection (
speechTimeout: 'auto') - Enhanced accuracy with
experimental_conversationsmodel
File: functions/respond.js
Purpose: Process transcribed speech and generate AI response
Flow:
- Receive
SpeechResultfrom/transcribe - Load persona data (agent or customer characteristics)
- Retrieve conversation history from Twilio Sync
- Call OpenAI gpt-4o-mini with conversation context
- Store updated conversation in Sync
- Return TwiML with
<Say>response - Redirect back to
/transcribefor next turn
State Management:
- Uses Twilio Sync Documents keyed by
conferenceId - Stores conversation as JSON array of messages
- TTL: 1 hour (conversations auto-expire)
Rate Limiting:
- Checks daily call count in Sync before calling OpenAI
- Respects
MAX_DAILY_CALLSenvironment variable - Returns error message if limit exceeded
File: functions/conference-status-webhook.js
Purpose: Handle conference lifecycle events
Events Handled:
conference.createdconference.completedcall.completedrecording.completed
Actions:
- Log conference durations
- Track participant join/leave
- Trigger post-processing workflows
File: functions/transcription-webhook.js
Purpose: Receive Voice Intelligence transcripts and operator results
Operator Results:
- Sentiment analysis (positive/negative/neutral)
- PII extraction (names, phone numbers, emails)
- Conversation classification
- Call resolution detection
- Fetches persona data from deployed assets
- Caches personas to reduce HTTP requests
- Builds OpenAI system prompts from persona characteristics
- Rate limiting using Sync counter
- Conversation history storage/retrieval
- Automatic TTL management
- Validates Twilio webhook signatures
- Prevents unauthorized requests
- Uses Twilio's built-in validator
- Retry logic with exponential backoff
- Circuit breaker pattern
- Structured error logging
customers.json → persona-loader → OpenAI system prompt
agents.json → persona-loader → OpenAI system prompt
SpeechResult → respond → Sync (write)
Sync (read) → respond → OpenAI context
Recording → Voice Intelligence → Transcript + Operators
Event Streams → Segment CDP → Customer Profiles
All webhooks validate Twilio signatures using X-Twilio-Signature header:
const twilioSignature = event.request.headers['x-twilio-signature'];
const isValid = twilio.validateRequest(authToken, signature, url, params);- Stored in
.env(never committed) - Required vars validated at startup
- Secrets never logged or exposed
- Daily call cap via
MAX_DAILY_CALLS - Stored in Twilio Sync (atomic increment)
- Prevents runaway OpenAI costs
- Serverless: Auto-scales with Twilio Functions
- Stateless: No local state, uses Sync for persistence
- Async: Event-driven webhooks
- OpenAI API: Rate limited by OpenAI (tier-based)
- Twilio Sync: 100 RPS per service
- Voice Intelligence: Transcription is asynchronous
- Batch Processing: Use bulk conference creation
- Queueing: Add SQS/EventBridge for high-volume scenarios
- Caching: Persona data is cached in memory
- Sharding: Multiple Sync services for >100 RPS
- Exponential backoff for transient errors
- Max 3 retries for API calls
- Circuit breaker after 5 consecutive failures
- Falls back to default responses if persona not found
- Returns generic message if OpenAI fails
- Continues conversation if Sync unavailable (with warning)
- Error webhook:
/error-handlerfor Twilio Debugger - Structured logging for all errors
- Severity classification (CRITICAL/HIGH/MEDIUM/LOW)
- Mock Twilio SDK
- Mock OpenAI responses
- Test TwiML generation
- Validate error handling
- Test webhook flows
- Validate Sync interactions
- Test persona loading
- Real Twilio calls
- OpenAI API integration
- Full conversation workflows
Local Dev → Pre-Deploy Checks → Twilio Serverless → Post-Deploy Validation
Checks:
✓ Environment variables set
✓ All tests passing
✓ Lint/format passing
✓ Persona data valid
Validation:
✓ Functions deployed
✓ Assets uploaded
✓ Webhooks responding
✓ Health check passing
- Real-time conversation monitoring dashboard
- Custom LLM fine-tuning for persona consistency
- Multi-language support
- Advanced emotion detection
- Call recording playback UI
- Redis cache for persona data
- Message queue for high-volume scenarios
- Multi-region deployment
- CDN for asset delivery
For implementation details, see: