| layout | default |
|---|---|
| title | Chapter 8: Production Deployment |
| parent | Vercel AI Tutorial |
| nav_order | 8 |
Welcome to Chapter 8: Production Deployment. In this part of Vercel AI SDK Tutorial: Production TypeScript AI Apps and Agents, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.
Congratulations! 🎉 You've made it to the final chapter. Now it's time to take your AI applications from development to production. We'll deploy to Vercel, implement monitoring, optimize performance, and ensure your applications can handle real-world traffic.
Let's start with deploying your AI applications to Vercel:
# Install Vercel CLI
npm i -g vercel
# Login to Vercel
vercel login
# Initialize project
vercel
# Add environment variables
vercel env add OPENAI_API_KEY
vercel env add ANTHROPIC_API_KEY
vercel env add DATABASE_URL
vercel env add NEXTAUTH_SECRET
vercel env add NEXTAUTH_URL
# Deploy
vercel --prod// vercel.json
{
"functions": {
"app/api/**/*.ts": {
"maxDuration": 30
},
"app/api/chat/**/*.ts": {
"maxDuration": 60
}
},
"crons": [
{
"path": "/api/cron/cleanup",
"schedule": "0 2 * * *"
}
],
"regions": ["iad1"],
"buildCommand": "npm run build",
"installCommand": "npm install",
"framework": "nextjs"
}// lib/env.ts
import { z } from 'zod'
const envSchema = z.object({
OPENAI_API_KEY: z.string().min(1),
ANTHROPIC_API_KEY: z.string().optional(),
DATABASE_URL: z.string().url(),
NEXTAUTH_SECRET: z.string().min(1),
NEXTAUTH_URL: z.string().url(),
REDIS_URL: z.string().url().optional(),
SENTRY_DSN: z.string().url().optional(),
STRIPE_SECRET_KEY: z.string().optional(),
UPLOADTHING_SECRET: z.string().optional(),
})
export const env = envSchema.parse(process.env)
// Type-safe environment variables
declare global {
namespace NodeJS {
interface ProcessEnv extends z.infer<typeof envSchema> {}
}
}// lib/prisma/production.ts
import { PrismaClient } from '@prisma/client'
const globalForPrisma = globalThis as unknown as {
prisma: PrismaClient | undefined
}
export const prisma =
globalForPrisma.prisma ??
new PrismaClient({
log: process.env.NODE_ENV === 'development' ? ['query', 'error', 'warn'] : ['error'],
datasources: {
db: {
url: process.env.DATABASE_URL,
},
},
})
if (process.env.NODE_ENV !== 'production') globalForPrisma.prisma = prisma
// Connection pooling for production
if (process.env.NODE_ENV === 'production') {
prisma.$on('beforeExit', async () => {
await prisma.$disconnect()
})
}// lib/cache.ts
import { Redis } from '@upstash/redis'
export const redis = new Redis({
url: process.env.REDIS_URL!,
token: process.env.REDIS_TOKEN!,
})
// AI Response Caching
export class AICache {
private static readonly TTL = 3600 // 1 hour
static async getCachedResponse(prompt: string, model: string): Promise<string | null> {
const key = `ai:${model}:${this.hashPrompt(prompt)}`
return await redis.get(key)
}
static async setCachedResponse(prompt: string, model: string, response: string): Promise<void> {
const key = `ai:${model}:${this.hashPrompt(prompt)}`
await redis.setex(key, this.TTL, response)
}
private static hashPrompt(prompt: string): string {
let hash = 0
for (let i = 0; i < prompt.length; i++) {
const char = prompt.charCodeAt(i)
hash = ((hash << 5) - hash) + char
hash = hash & hash // Convert to 32-bit integer
}
return Math.abs(hash).toString(36)
}
}
// Rate Limiting
export class RateLimiter {
private static readonly WINDOW = 60 // 1 minute
private static readonly MAX_REQUESTS = 100
static async checkLimit(userId: string): Promise<boolean> {
const key = `ratelimit:${userId}`
const requests = await redis.incr(key)
if (requests === 1) {
await redis.expire(key, this.WINDOW)
}
return requests <= this.MAX_REQUESTS
}
static async getRemainingRequests(userId: string): Promise<number> {
const key = `ratelimit:${userId}`
const requests = await redis.get(key) || 0
return Math.max(0, this.MAX_REQUESTS - Number(requests))
}
}// lib/monitoring.ts
import * as Sentry from '@sentry/nextjs'
// Sentry Configuration
Sentry.init({
dsn: process.env.SENTRY_DSN,
tracesSampleRate: 1.0,
environment: process.env.NODE_ENV,
integrations: [
new Sentry.Integrations.Http({ tracing: true }),
new Sentry.Integrations.Console(),
],
})
// Custom error tracking for AI operations
export class AIMonitoring {
static trackAIRequest(model: string, prompt: string, userId?: string) {
Sentry.addBreadcrumb({
category: 'ai',
message: 'AI Request',
data: {
model,
promptLength: prompt.length,
userId,
},
})
}
static trackAIResponse(model: string, response: string, duration: number, userId?: string) {
Sentry.addBreadcrumb({
category: 'ai',
message: 'AI Response',
data: {
model,
responseLength: response.length,
duration,
userId,
},
})
}
static trackAIError(model: string, error: Error, userId?: string) {
Sentry.captureException(error, {
tags: {
ai_model: model,
user_id: userId,
},
extra: {
model,
errorMessage: error.message,
},
})
}
}
// Performance monitoring
export class PerformanceMonitor {
static async trackAPIResponse(endpoint: string, duration: number, status: number) {
// Send to monitoring service (e.g., DataDog, New Relic)
console.log(`API Performance: ${endpoint} - ${duration}ms - ${status}`)
}
static async trackAIUsage(model: string, tokens: number, cost: number) {
// Track AI usage and costs
console.log(`AI Usage: ${model} - ${tokens} tokens - $${cost}`)
}
}// lib/error-handler.ts
import { NextRequest, NextResponse } from 'next/server'
import { AIMonitoring } from './monitoring'
export class AIErrorHandler {
static handleAPIError(error: any, model: string, userId?: string) {
AIMonitoring.trackAIError(model, error, userId)
// Determine error type and response
if (error?.response?.status === 429) {
return NextResponse.json(
{ error: 'Rate limit exceeded. Please try again later.' },
{ status: 429 }
)
}
if (error?.response?.status === 401) {
return NextResponse.json(
{ error: 'Invalid API key configuration.' },
{ status: 500 }
)
}
if (error?.code === 'ECONNREFUSED') {
return NextResponse.json(
{ error: 'AI service temporarily unavailable.' },
{ status: 503 }
)
}
return NextResponse.json(
{ error: 'An unexpected error occurred.' },
{ status: 500 }
)
}
static async withFallback<T>(
primaryFn: () => Promise<T>,
fallbackFn: () => Promise<T>,
model: string,
userId?: string
): Promise<T> {
try {
return await primaryFn()
} catch (error) {
console.warn(`Primary AI call failed for ${model}, using fallback`)
AIMonitoring.trackAIError(model, error as Error, userId)
try {
return await fallbackFn()
} catch (fallbackError) {
console.error(`Fallback AI call also failed for ${model}`)
AIMonitoring.trackAIError(model, fallbackError as Error, userId)
throw fallbackError
}
}
}
}
// Circuit breaker pattern
export class CircuitBreaker {
private failures = 0
private lastFailureTime = 0
private readonly threshold = 5
private readonly timeout = 60000 // 1 minute
async execute<T>(fn: () => Promise<T>): Promise<T> {
if (this.isOpen()) {
throw new Error('Circuit breaker is open')
}
try {
const result = await fn()
this.onSuccess()
return result
} catch (error) {
this.onFailure()
throw error
}
}
private isOpen(): boolean {
if (this.failures >= this.threshold) {
const timeSinceLastFailure = Date.now() - this.lastFailureTime
if (timeSinceLastFailure < this.timeout) {
return true
} else {
// Reset after timeout
this.failures = 0
return false
}
}
return false
}
private onSuccess() {
this.failures = 0
}
private onFailure() {
this.failures++
this.lastFailureTime = Date.now()
}
}// app/api/chat/route.ts (Production Version)
import { NextRequest, NextResponse } from 'next/server'
import { getServerSession } from 'next-auth'
import { streamText } from 'ai'
import { openai } from '@ai-sdk/openai'
import { anthropic } from '@ai-sdk/anthropic'
import { AICache, RateLimiter, AIMonitoring, AIErrorHandler, CircuitBreaker } from '@/lib'
const circuitBreaker = new CircuitBreaker()
export async function POST(req: NextRequest) {
try {
// Authentication
const session = await getServerSession()
if (!session?.user?.id) {
return NextResponse.json({ error: 'Unauthorized' }, { status: 401 })
}
// Rate limiting
const canProceed = await RateLimiter.checkLimit(session.user.id)
if (!canProceed) {
return NextResponse.json(
{ error: 'Rate limit exceeded' },
{ status: 429 }
)
}
const { messages, model = 'gpt-4', useCache = true } = await req.json()
// Input validation
if (!messages || !Array.isArray(messages)) {
return NextResponse.json(
{ error: 'Invalid messages format' },
{ status: 400 }
)
}
const lastMessage = messages[messages.length - 1]?.content
if (!lastMessage) {
return NextResponse.json(
{ error: 'No message content provided' },
{ status: 400 }
)
}
// Cache check
if (useCache) {
const cachedResponse = await AICache.getCachedResponse(lastMessage, model)
if (cachedResponse) {
AIMonitoring.trackAIRequest(model, lastMessage, session.user.id)
return NextResponse.json({ response: cachedResponse, cached: true })
}
}
// Model selection with fallback
const getModel = () => {
switch (model) {
case 'gpt-4':
return openai('gpt-4')
case 'claude':
return anthropic('claude-3-sonnet-20240229')
default:
return openai('gpt-3.5-turbo')
}
}
AIMonitoring.trackAIRequest(model, lastMessage, session.user.id)
// Circuit breaker protection
const result = await circuitBreaker.execute(async () => {
return await AIErrorHandler.withFallback(
async () => {
const primaryModel = getModel()
return await streamText({
model: primaryModel,
messages,
temperature: 0.7,
maxTokens: 1000,
})
},
async () => {
// Fallback to GPT-3.5
return await streamText({
model: openai('gpt-3.5-turbo'),
messages,
temperature: 0.7,
maxTokens: 1000,
})
},
model,
session.user.id
)
})
// Cache the response
if (useCache) {
let fullResponse = ''
for await (const delta of result.textStream) {
fullResponse += delta
}
await AICache.setCachedResponse(lastMessage, model, fullResponse)
}
return result.toDataStreamResponse()
} catch (error) {
return AIErrorHandler.handleAPIError(error, 'unknown', session?.user?.id)
}
}// app/api/health/route.ts
import { NextRequest, NextResponse } from 'next/server'
import { prisma } from '@/lib/prisma'
import { redis } from '@/lib/cache'
export async function GET(req: NextRequest) {
const checks = {
timestamp: new Date().toISOString(),
uptime: process.uptime(),
version: process.version,
environment: process.env.NODE_ENV,
}
// Database health check
try {
await prisma.$queryRaw`SELECT 1`
checks.database = 'healthy'
} catch (error) {
checks.database = 'unhealthy'
checks.databaseError = error.message
}
// Redis health check
try {
await redis.ping()
checks.redis = 'healthy'
} catch (error) {
checks.redis = 'unhealthy'
checks.redisError = error.message
}
// AI service health check
try {
const response = await fetch('https://api.openai.com/v1/models', {
headers: {
'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
},
})
checks.aiService = response.ok ? 'healthy' : 'unhealthy'
} catch (error) {
checks.aiService = 'unhealthy'
checks.aiServiceError = error.message
}
const isHealthy = Object.values(checks).every(
value => typeof value === 'string' && !value.includes('unhealthy')
)
return NextResponse.json(checks, {
status: isHealthy ? 200 : 503,
})
}// lib/load-balancer.ts
export class AILoadBalancer {
private static readonly models = [
{ name: 'gpt-4', priority: 1, capacity: 100 },
{ name: 'gpt-3.5-turbo', priority: 2, capacity: 1000 },
{ name: 'claude-3-sonnet', priority: 1, capacity: 100 },
]
private static usage = new Map<string, number>()
static selectModel(userTier: 'free' | 'premium' | 'enterprise' = 'free'): string {
const availableModels = this.models.filter(model => {
const currentUsage = this.usage.get(model.name) || 0
return currentUsage < model.capacity
})
if (availableModels.length === 0) {
throw new Error('All AI models at capacity')
}
// Sort by priority and availability
availableModels.sort((a, b) => {
if (a.priority !== b.priority) {
return a.priority - b.priority
}
const aUsage = this.usage.get(a.name) || 0
const bUsage = this.usage.get(b.name) || 0
return aUsage - bUsage
})
return availableModels[0].name
}
static recordUsage(model: string) {
const current = this.usage.get(model) || 0
this.usage.set(model, current + 1)
// Reset usage counters periodically
setInterval(() => {
this.usage.clear()
}, 3600000) // Reset every hour
}
}
// Middleware for load balancing
import { NextRequest, NextResponse } from 'next/server'
export function loadBalancingMiddleware(request: NextRequest) {
const userTier = request.headers.get('x-user-tier') as 'free' | 'premium' | 'enterprise' || 'free'
try {
const selectedModel = AILoadBalancer.selectModel(userTier)
const response = NextResponse.next()
response.headers.set('x-selected-model', selectedModel)
return response
} catch (error) {
return NextResponse.json(
{ error: 'AI services temporarily at capacity' },
{ status: 503 }
)
}
}// lib/security.ts
import { NextRequest, NextResponse } from 'next/server'
import { Ratelimit } from '@upstash/ratelimit'
import { redis } from './cache'
export const ratelimit = new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(100, '1 m'), // 100 requests per minute
})
export class SecurityMiddleware {
static async validateRequest(request: NextRequest): Promise<NextResponse | null> {
// Rate limiting
const ip = request.ip ?? '127.0.0.1'
const { success } = await ratelimit.limit(ip)
if (!success) {
return NextResponse.json(
{ error: 'Rate limit exceeded' },
{ status: 429 }
)
}
// Input sanitization
const body = await request.json().catch(() => ({}))
const sanitizedBody = this.sanitizeInput(body)
// Content filtering
if (this.containsHarmfulContent(sanitizedBody)) {
return NextResponse.json(
{ error: 'Content violates usage policy' },
{ status: 400 }
)
}
return null
}
private static sanitizeInput(input: any): any {
if (typeof input === 'string') {
return input
.replace(/<script\b[^<]*(?:(?!<\/script>)<[^<]*)*<\/script>/gi, '')
.replace(/javascript:/gi, '')
.slice(0, 10000) // Limit input length
}
if (Array.isArray(input)) {
return input.map(item => this.sanitizeInput(item))
}
if (typeof input === 'object' && input !== null) {
const sanitized: any = {}
for (const [key, value] of Object.entries(input)) {
if (key.length < 100) { // Limit key length
sanitized[key] = this.sanitizeInput(value)
}
}
return sanitized
}
return input
}
private static containsHarmfulContent(input: any): boolean {
const harmfulPatterns = [
/<script/i,
/javascript:/i,
/on\w+\s*=/i,
/eval\s*\(/i,
/document\./i,
/window\./i,
]
const inputString = JSON.stringify(input).toLowerCase()
return harmfulPatterns.some(pattern => pattern.test(inputString))
}
}
// API route protection
export function withSecurity<T extends any[]>(
handler: (...args: T) => Promise<NextResponse>,
options: { requireAuth?: boolean } = {}
) {
return async (...args: T): Promise<NextResponse> => {
const [request] = args as [NextRequest]
// Security validation
const securityResponse = await SecurityMiddleware.validateRequest(request)
if (securityResponse) {
return securityResponse
}
// Authentication check
if (options.requireAuth) {
const session = await getServerSession()
if (!session) {
return NextResponse.json({ error: 'Unauthorized' }, { status: 401 })
}
}
return handler(...args)
}
}// lib/backup.ts
import { exec } from 'child_process'
import { promisify } from 'util'
import { writeFile, readFile } from 'fs/promises'
const execAsync = promisify(exec)
export class BackupManager {
static async createDatabaseBackup(): Promise<string> {
const timestamp = new Date().toISOString().replace(/[:.]/g, '-')
const filename = `backup-${timestamp}.sql`
try {
const { stdout } = await execAsync(
`pg_dump ${process.env.DATABASE_URL} > ${filename}`
)
// Upload to cloud storage (e.g., AWS S3)
await this.uploadToCloud(filename)
return filename
} catch (error) {
console.error('Backup failed:', error)
throw error
}
}
static async restoreDatabaseBackup(filename: string): Promise<void> {
try {
// Download from cloud storage
await this.downloadFromCloud(filename)
const { stdout } = await execAsync(
`psql ${process.env.DATABASE_URL} < ${filename}`
)
console.log('Database restored successfully')
} catch (error) {
console.error('Restore failed:', error)
throw error
}
}
static async backupUserData(userId: string): Promise<void> {
try {
// Export user data
const userData = await prisma.user.findUnique({
where: { id: userId },
include: {
posts: true,
tickets: true,
},
})
const filename = `user-${userId}-${Date.now()}.json`
await writeFile(filename, JSON.stringify(userData, null, 2))
// Encrypt and upload
await this.encryptAndUpload(filename, userId)
} catch (error) {
console.error('User data backup failed:', error)
throw error
}
}
private static async uploadToCloud(filename: string): Promise<void> {
// Implement cloud upload (AWS S3, Google Cloud, etc.)
console.log(`Uploading ${filename} to cloud storage`)
}
private static async downloadFromCloud(filename: string): Promise<void> {
// Implement cloud download
console.log(`Downloading ${filename} from cloud storage`)
}
private static async encryptAndUpload(filename: string, userId: string): Promise<void> {
// Implement encryption and upload
console.log(`Encrypting and uploading ${filename}`)
}
}
// Automated backup cron job
export async function scheduledBackup() {
try {
const filename = await BackupManager.createDatabaseBackup()
console.log(`Automated backup created: ${filename}`)
// Clean up old backups (keep last 30 days)
await cleanupOldBackups(30)
} catch (error) {
console.error('Scheduled backup failed:', error)
// Send alert to monitoring service
}
}
async function cleanupOldBackups(daysToKeep: number): Promise<void> {
// Implement cleanup logic
console.log(`Cleaning up backups older than ${daysToKeep} days`)
}✅ Pre-deployment:
- Environment variables configured
- Database migrations run
- SSL certificates configured
- Domain DNS configured
- CDN setup (optional)
✅ Security:
- Rate limiting implemented
- Input validation added
- Authentication configured
- HTTPS enabled
- CORS configured
✅ Performance:
- Caching implemented
- Database queries optimized
- Images optimized
- CDN configured
- Compression enabled
✅ Monitoring:
- Error tracking (Sentry)
- Performance monitoring
- Uptime monitoring
- Log aggregation
✅ Backup & Recovery:
- Database backups automated
- User data backup system
- Recovery procedures documented
- Failover systems configured
🎉 Congratulations! You've successfully deployed production-ready AI applications:
- Vercel Deployment - Seamless deployment with environment configuration
- Production Database - Optimized Prisma setup with connection pooling
- Caching & Performance - Redis caching and rate limiting
- Monitoring & Observability - Sentry integration and custom monitoring
- Error Handling - Circuit breakers and fallback strategies
- Security - Rate limiting, input validation, and content filtering
- Scaling - Load balancing and capacity management
- Backup & Recovery - Automated backups and recovery procedures
- Health Checks - Comprehensive system monitoring
- Production Checklist - Complete deployment verification
You've now built and deployed complete AI-powered applications! Here's what's next:
🚀 Scale to Millions: Implement advanced caching, CDN, and microservices
🤖 Custom AI Models: Fine-tune models for your specific use cases
📊 Advanced Analytics: Track AI performance and user engagement
🔒 Enterprise Security: SSO, audit logs, and compliance features
🌍 Global Deployment: Multi-region deployment and localization
⚡ Edge Computing: Deploy AI models closer to users
🎯 A/B Testing: Test different AI models and prompts
📈 Revenue Models: Monetize your AI applications
Remember:
- Start Small: Begin with simple features and iterate
- Monitor Everything: Use data to guide your decisions
- Security First: Always prioritize user data protection
- Performance Matters: Users expect fast, responsive AI
- Ethics Count: Build responsible AI applications
- Community: Join the AI developer community for support
You've accomplished something amazing! The world of AI application development is yours to explore. Keep building, keep learning, and keep creating incredible experiences with AI.
What's your next AI adventure? 🌟
Thank you for completing the Vercel AI Tutorial! Your journey into AI-powered application development has just begun. 🚀
Most teams struggle here because the hard part is not writing more code, but deciding clear boundaries for error, model, static so behavior stays predictable as complexity grows.
In practical terms, this chapter helps you avoid three common failures:
- coupling core logic too tightly to one implementation path
- missing the handoff boundaries between setup, execution, and validation
- shipping changes without clear rollback or observability strategy
After working through this chapter, you should be able to reason about Chapter 8: Production Deployment as an operating subsystem inside Vercel AI SDK Tutorial: Production TypeScript AI Apps and Agents, with explicit contracts for inputs, state transitions, and outputs.
Use the implementation notes around NextResponse, userId, Promise as your checklist when adapting these patterns to your own repository.
Under the hood, Chapter 8: Production Deployment usually follows a repeatable control path:
- Context bootstrap: initialize runtime config and prerequisites for
error. - Input normalization: shape incoming data so
modelreceives stable contracts. - Core execution: run the main logic branch and propagate intermediate state through
static. - Policy and safety checks: enforce limits, auth scopes, and failure boundaries.
- Output composition: return canonical result payloads for downstream consumers.
- Operational telemetry: emit logs/metrics needed for debugging and performance tuning.
When debugging, walk this sequence in order and confirm each stage has explicit success/failure conditions.
Use the following upstream sources to verify implementation details while reading this chapter:
- AI SDK Repository
Why it matters: authoritative reference on
AI SDK Repository(github.com). - AI SDK Releases
Why it matters: authoritative reference on
AI SDK Releases(github.com). - AI SDK Docs
Why it matters: authoritative reference on
AI SDK Docs(ai-sdk.dev).
Suggested trace strategy:
- search upstream code for
errorandmodelto map concrete implementation paths - compare docs claims against actual runtime/config code before reusing patterns in production