Skip to content

Latest commit

 

History

History
947 lines (779 loc) · 25.9 KB

File metadata and controls

947 lines (779 loc) · 25.9 KB
layout default
title Chapter 8: Production Deployment
parent Vercel AI Tutorial
nav_order 8

Chapter 8: Production Deployment

Welcome to Chapter 8: Production Deployment. In this part of Vercel AI SDK Tutorial: Production TypeScript AI Apps and Agents, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.

Congratulations! 🎉 You've made it to the final chapter. Now it's time to take your AI applications from development to production. We'll deploy to Vercel, implement monitoring, optimize performance, and ensure your applications can handle real-world traffic.

Vercel Deployment Setup

Let's start with deploying your AI applications to Vercel:

# Install Vercel CLI
npm i -g vercel

# Login to Vercel
vercel login

# Initialize project
vercel

# Add environment variables
vercel env add OPENAI_API_KEY
vercel env add ANTHROPIC_API_KEY
vercel env add DATABASE_URL
vercel env add NEXTAUTH_SECRET
vercel env add NEXTAUTH_URL

# Deploy
vercel --prod

Vercel Configuration

// vercel.json
{
  "functions": {
    "app/api/**/*.ts": {
      "maxDuration": 30
    },
    "app/api/chat/**/*.ts": {
      "maxDuration": 60
    }
  },
  "crons": [
    {
      "path": "/api/cron/cleanup",
      "schedule": "0 2 * * *"
    }
  ],
  "regions": ["iad1"],
  "buildCommand": "npm run build",
  "installCommand": "npm install",
  "framework": "nextjs"
}

Environment Variables and Secrets

// lib/env.ts
import { z } from 'zod'

const envSchema = z.object({
  OPENAI_API_KEY: z.string().min(1),
  ANTHROPIC_API_KEY: z.string().optional(),
  DATABASE_URL: z.string().url(),
  NEXTAUTH_SECRET: z.string().min(1),
  NEXTAUTH_URL: z.string().url(),
  REDIS_URL: z.string().url().optional(),
  SENTRY_DSN: z.string().url().optional(),
  STRIPE_SECRET_KEY: z.string().optional(),
  UPLOADTHING_SECRET: z.string().optional(),
})

export const env = envSchema.parse(process.env)

// Type-safe environment variables
declare global {
  namespace NodeJS {
    interface ProcessEnv extends z.infer<typeof envSchema> {}
  }
}

Database Production Setup

// lib/prisma/production.ts
import { PrismaClient } from '@prisma/client'

const globalForPrisma = globalThis as unknown as {
  prisma: PrismaClient | undefined
}

export const prisma =
  globalForPrisma.prisma ??
  new PrismaClient({
    log: process.env.NODE_ENV === 'development' ? ['query', 'error', 'warn'] : ['error'],
    datasources: {
      db: {
        url: process.env.DATABASE_URL,
      },
    },
  })

if (process.env.NODE_ENV !== 'production') globalForPrisma.prisma = prisma

// Connection pooling for production
if (process.env.NODE_ENV === 'production') {
  prisma.$on('beforeExit', async () => {
    await prisma.$disconnect()
  })
}

Caching and Performance Optimization

// lib/cache.ts
import { Redis } from '@upstash/redis'

export const redis = new Redis({
  url: process.env.REDIS_URL!,
  token: process.env.REDIS_TOKEN!,
})

// AI Response Caching
export class AICache {
  private static readonly TTL = 3600 // 1 hour

  static async getCachedResponse(prompt: string, model: string): Promise<string | null> {
    const key = `ai:${model}:${this.hashPrompt(prompt)}`
    return await redis.get(key)
  }

  static async setCachedResponse(prompt: string, model: string, response: string): Promise<void> {
    const key = `ai:${model}:${this.hashPrompt(prompt)}`
    await redis.setex(key, this.TTL, response)
  }

  private static hashPrompt(prompt: string): string {
    let hash = 0
    for (let i = 0; i < prompt.length; i++) {
      const char = prompt.charCodeAt(i)
      hash = ((hash << 5) - hash) + char
      hash = hash & hash // Convert to 32-bit integer
    }
    return Math.abs(hash).toString(36)
  }
}

// Rate Limiting
export class RateLimiter {
  private static readonly WINDOW = 60 // 1 minute
  private static readonly MAX_REQUESTS = 100

  static async checkLimit(userId: string): Promise<boolean> {
    const key = `ratelimit:${userId}`
    const requests = await redis.incr(key)

    if (requests === 1) {
      await redis.expire(key, this.WINDOW)
    }

    return requests <= this.MAX_REQUESTS
  }

  static async getRemainingRequests(userId: string): Promise<number> {
    const key = `ratelimit:${userId}`
    const requests = await redis.get(key) || 0
    return Math.max(0, this.MAX_REQUESTS - Number(requests))
  }
}

Monitoring and Observability

// lib/monitoring.ts
import * as Sentry from '@sentry/nextjs'

// Sentry Configuration
Sentry.init({
  dsn: process.env.SENTRY_DSN,
  tracesSampleRate: 1.0,
  environment: process.env.NODE_ENV,
  integrations: [
    new Sentry.Integrations.Http({ tracing: true }),
    new Sentry.Integrations.Console(),
  ],
})

// Custom error tracking for AI operations
export class AIMonitoring {
  static trackAIRequest(model: string, prompt: string, userId?: string) {
    Sentry.addBreadcrumb({
      category: 'ai',
      message: 'AI Request',
      data: {
        model,
        promptLength: prompt.length,
        userId,
      },
    })
  }

  static trackAIResponse(model: string, response: string, duration: number, userId?: string) {
    Sentry.addBreadcrumb({
      category: 'ai',
      message: 'AI Response',
      data: {
        model,
        responseLength: response.length,
        duration,
        userId,
      },
    })
  }

  static trackAIError(model: string, error: Error, userId?: string) {
    Sentry.captureException(error, {
      tags: {
        ai_model: model,
        user_id: userId,
      },
      extra: {
        model,
        errorMessage: error.message,
      },
    })
  }
}

// Performance monitoring
export class PerformanceMonitor {
  static async trackAPIResponse(endpoint: string, duration: number, status: number) {
    // Send to monitoring service (e.g., DataDog, New Relic)
    console.log(`API Performance: ${endpoint} - ${duration}ms - ${status}`)
  }

  static async trackAIUsage(model: string, tokens: number, cost: number) {
    // Track AI usage and costs
    console.log(`AI Usage: ${model} - ${tokens} tokens - $${cost}`)
  }
}

Error Handling and Fallbacks

// lib/error-handler.ts
import { NextRequest, NextResponse } from 'next/server'
import { AIMonitoring } from './monitoring'

export class AIErrorHandler {
  static handleAPIError(error: any, model: string, userId?: string) {
    AIMonitoring.trackAIError(model, error, userId)

    // Determine error type and response
    if (error?.response?.status === 429) {
      return NextResponse.json(
        { error: 'Rate limit exceeded. Please try again later.' },
        { status: 429 }
      )
    }

    if (error?.response?.status === 401) {
      return NextResponse.json(
        { error: 'Invalid API key configuration.' },
        { status: 500 }
      )
    }

    if (error?.code === 'ECONNREFUSED') {
      return NextResponse.json(
        { error: 'AI service temporarily unavailable.' },
        { status: 503 }
      )
    }

    return NextResponse.json(
      { error: 'An unexpected error occurred.' },
      { status: 500 }
    )
  }

  static async withFallback<T>(
    primaryFn: () => Promise<T>,
    fallbackFn: () => Promise<T>,
    model: string,
    userId?: string
  ): Promise<T> {
    try {
      return await primaryFn()
    } catch (error) {
      console.warn(`Primary AI call failed for ${model}, using fallback`)
      AIMonitoring.trackAIError(model, error as Error, userId)

      try {
        return await fallbackFn()
      } catch (fallbackError) {
        console.error(`Fallback AI call also failed for ${model}`)
        AIMonitoring.trackAIError(model, fallbackError as Error, userId)
        throw fallbackError
      }
    }
  }
}

// Circuit breaker pattern
export class CircuitBreaker {
  private failures = 0
  private lastFailureTime = 0
  private readonly threshold = 5
  private readonly timeout = 60000 // 1 minute

  async execute<T>(fn: () => Promise<T>): Promise<T> {
    if (this.isOpen()) {
      throw new Error('Circuit breaker is open')
    }

    try {
      const result = await fn()
      this.onSuccess()
      return result
    } catch (error) {
      this.onFailure()
      throw error
    }
  }

  private isOpen(): boolean {
    if (this.failures >= this.threshold) {
      const timeSinceLastFailure = Date.now() - this.lastFailureTime
      if (timeSinceLastFailure < this.timeout) {
        return true
      } else {
        // Reset after timeout
        this.failures = 0
        return false
      }
    }
    return false
  }

  private onSuccess() {
    this.failures = 0
  }

  private onFailure() {
    this.failures++
    this.lastFailureTime = Date.now()
  }
}

Production API Routes

// app/api/chat/route.ts (Production Version)
import { NextRequest, NextResponse } from 'next/server'
import { getServerSession } from 'next-auth'
import { streamText } from 'ai'
import { openai } from '@ai-sdk/openai'
import { anthropic } from '@ai-sdk/anthropic'
import { AICache, RateLimiter, AIMonitoring, AIErrorHandler, CircuitBreaker } from '@/lib'

const circuitBreaker = new CircuitBreaker()

export async function POST(req: NextRequest) {
  try {
    // Authentication
    const session = await getServerSession()
    if (!session?.user?.id) {
      return NextResponse.json({ error: 'Unauthorized' }, { status: 401 })
    }

    // Rate limiting
    const canProceed = await RateLimiter.checkLimit(session.user.id)
    if (!canProceed) {
      return NextResponse.json(
        { error: 'Rate limit exceeded' },
        { status: 429 }
      )
    }

    const { messages, model = 'gpt-4', useCache = true } = await req.json()

    // Input validation
    if (!messages || !Array.isArray(messages)) {
      return NextResponse.json(
        { error: 'Invalid messages format' },
        { status: 400 }
      )
    }

    const lastMessage = messages[messages.length - 1]?.content
    if (!lastMessage) {
      return NextResponse.json(
        { error: 'No message content provided' },
        { status: 400 }
      )
    }

    // Cache check
    if (useCache) {
      const cachedResponse = await AICache.getCachedResponse(lastMessage, model)
      if (cachedResponse) {
        AIMonitoring.trackAIRequest(model, lastMessage, session.user.id)
        return NextResponse.json({ response: cachedResponse, cached: true })
      }
    }

    // Model selection with fallback
    const getModel = () => {
      switch (model) {
        case 'gpt-4':
          return openai('gpt-4')
        case 'claude':
          return anthropic('claude-3-sonnet-20240229')
        default:
          return openai('gpt-3.5-turbo')
      }
    }

    AIMonitoring.trackAIRequest(model, lastMessage, session.user.id)

    // Circuit breaker protection
    const result = await circuitBreaker.execute(async () => {
      return await AIErrorHandler.withFallback(
        async () => {
          const primaryModel = getModel()
          return await streamText({
            model: primaryModel,
            messages,
            temperature: 0.7,
            maxTokens: 1000,
          })
        },
        async () => {
          // Fallback to GPT-3.5
          return await streamText({
            model: openai('gpt-3.5-turbo'),
            messages,
            temperature: 0.7,
            maxTokens: 1000,
          })
        },
        model,
        session.user.id
      )
    })

    // Cache the response
    if (useCache) {
      let fullResponse = ''
      for await (const delta of result.textStream) {
        fullResponse += delta
      }
      await AICache.setCachedResponse(lastMessage, model, fullResponse)
    }

    return result.toDataStreamResponse()

  } catch (error) {
    return AIErrorHandler.handleAPIError(error, 'unknown', session?.user?.id)
  }
}

Health Checks and Monitoring

// app/api/health/route.ts
import { NextRequest, NextResponse } from 'next/server'
import { prisma } from '@/lib/prisma'
import { redis } from '@/lib/cache'

export async function GET(req: NextRequest) {
  const checks = {
    timestamp: new Date().toISOString(),
    uptime: process.uptime(),
    version: process.version,
    environment: process.env.NODE_ENV,
  }

  // Database health check
  try {
    await prisma.$queryRaw`SELECT 1`
    checks.database = 'healthy'
  } catch (error) {
    checks.database = 'unhealthy'
    checks.databaseError = error.message
  }

  // Redis health check
  try {
    await redis.ping()
    checks.redis = 'healthy'
  } catch (error) {
    checks.redis = 'unhealthy'
    checks.redisError = error.message
  }

  // AI service health check
  try {
    const response = await fetch('https://api.openai.com/v1/models', {
      headers: {
        'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
      },
    })
    checks.aiService = response.ok ? 'healthy' : 'unhealthy'
  } catch (error) {
    checks.aiService = 'unhealthy'
    checks.aiServiceError = error.message
  }

  const isHealthy = Object.values(checks).every(
    value => typeof value === 'string' && !value.includes('unhealthy')
  )

  return NextResponse.json(checks, {
    status: isHealthy ? 200 : 503,
  })
}

Scaling and Load Balancing

// lib/load-balancer.ts
export class AILoadBalancer {
  private static readonly models = [
    { name: 'gpt-4', priority: 1, capacity: 100 },
    { name: 'gpt-3.5-turbo', priority: 2, capacity: 1000 },
    { name: 'claude-3-sonnet', priority: 1, capacity: 100 },
  ]

  private static usage = new Map<string, number>()

  static selectModel(userTier: 'free' | 'premium' | 'enterprise' = 'free'): string {
    const availableModels = this.models.filter(model => {
      const currentUsage = this.usage.get(model.name) || 0
      return currentUsage < model.capacity
    })

    if (availableModels.length === 0) {
      throw new Error('All AI models at capacity')
    }

    // Sort by priority and availability
    availableModels.sort((a, b) => {
      if (a.priority !== b.priority) {
        return a.priority - b.priority
      }
      const aUsage = this.usage.get(a.name) || 0
      const bUsage = this.usage.get(b.name) || 0
      return aUsage - bUsage
    })

    return availableModels[0].name
  }

  static recordUsage(model: string) {
    const current = this.usage.get(model) || 0
    this.usage.set(model, current + 1)

    // Reset usage counters periodically
    setInterval(() => {
      this.usage.clear()
    }, 3600000) // Reset every hour
  }
}

// Middleware for load balancing
import { NextRequest, NextResponse } from 'next/server'

export function loadBalancingMiddleware(request: NextRequest) {
  const userTier = request.headers.get('x-user-tier') as 'free' | 'premium' | 'enterprise' || 'free'

  try {
    const selectedModel = AILoadBalancer.selectModel(userTier)

    const response = NextResponse.next()
    response.headers.set('x-selected-model', selectedModel)

    return response
  } catch (error) {
    return NextResponse.json(
      { error: 'AI services temporarily at capacity' },
      { status: 503 }
    )
  }
}

Security Best Practices

// lib/security.ts
import { NextRequest, NextResponse } from 'next/server'
import { Ratelimit } from '@upstash/ratelimit'
import { redis } from './cache'

export const ratelimit = new Ratelimit({
  redis,
  limiter: Ratelimit.slidingWindow(100, '1 m'), // 100 requests per minute
})

export class SecurityMiddleware {
  static async validateRequest(request: NextRequest): Promise<NextResponse | null> {
    // Rate limiting
    const ip = request.ip ?? '127.0.0.1'
    const { success } = await ratelimit.limit(ip)

    if (!success) {
      return NextResponse.json(
        { error: 'Rate limit exceeded' },
        { status: 429 }
      )
    }

    // Input sanitization
    const body = await request.json().catch(() => ({}))
    const sanitizedBody = this.sanitizeInput(body)

    // Content filtering
    if (this.containsHarmfulContent(sanitizedBody)) {
      return NextResponse.json(
        { error: 'Content violates usage policy' },
        { status: 400 }
      )
    }

    return null
  }

  private static sanitizeInput(input: any): any {
    if (typeof input === 'string') {
      return input
        .replace(/<script\b[^<]*(?:(?!<\/script>)<[^<]*)*<\/script>/gi, '')
        .replace(/javascript:/gi, '')
        .slice(0, 10000) // Limit input length
    }

    if (Array.isArray(input)) {
      return input.map(item => this.sanitizeInput(item))
    }

    if (typeof input === 'object' && input !== null) {
      const sanitized: any = {}
      for (const [key, value] of Object.entries(input)) {
        if (key.length < 100) { // Limit key length
          sanitized[key] = this.sanitizeInput(value)
        }
      }
      return sanitized
    }

    return input
  }

  private static containsHarmfulContent(input: any): boolean {
    const harmfulPatterns = [
      /<script/i,
      /javascript:/i,
      /on\w+\s*=/i,
      /eval\s*\(/i,
      /document\./i,
      /window\./i,
    ]

    const inputString = JSON.stringify(input).toLowerCase()

    return harmfulPatterns.some(pattern => pattern.test(inputString))
  }
}

// API route protection
export function withSecurity<T extends any[]>(
  handler: (...args: T) => Promise<NextResponse>,
  options: { requireAuth?: boolean } = {}
) {
  return async (...args: T): Promise<NextResponse> => {
    const [request] = args as [NextRequest]

    // Security validation
    const securityResponse = await SecurityMiddleware.validateRequest(request)
    if (securityResponse) {
      return securityResponse
    }

    // Authentication check
    if (options.requireAuth) {
      const session = await getServerSession()
      if (!session) {
        return NextResponse.json({ error: 'Unauthorized' }, { status: 401 })
      }
    }

    return handler(...args)
  }
}

Backup and Recovery

// lib/backup.ts
import { exec } from 'child_process'
import { promisify } from 'util'
import { writeFile, readFile } from 'fs/promises'

const execAsync = promisify(exec)

export class BackupManager {
  static async createDatabaseBackup(): Promise<string> {
    const timestamp = new Date().toISOString().replace(/[:.]/g, '-')
    const filename = `backup-${timestamp}.sql`

    try {
      const { stdout } = await execAsync(
        `pg_dump ${process.env.DATABASE_URL} > ${filename}`
      )

      // Upload to cloud storage (e.g., AWS S3)
      await this.uploadToCloud(filename)

      return filename
    } catch (error) {
      console.error('Backup failed:', error)
      throw error
    }
  }

  static async restoreDatabaseBackup(filename: string): Promise<void> {
    try {
      // Download from cloud storage
      await this.downloadFromCloud(filename)

      const { stdout } = await execAsync(
        `psql ${process.env.DATABASE_URL} < ${filename}`
      )

      console.log('Database restored successfully')
    } catch (error) {
      console.error('Restore failed:', error)
      throw error
    }
  }

  static async backupUserData(userId: string): Promise<void> {
    try {
      // Export user data
      const userData = await prisma.user.findUnique({
        where: { id: userId },
        include: {
          posts: true,
          tickets: true,
        },
      })

      const filename = `user-${userId}-${Date.now()}.json`
      await writeFile(filename, JSON.stringify(userData, null, 2))

      // Encrypt and upload
      await this.encryptAndUpload(filename, userId)
    } catch (error) {
      console.error('User data backup failed:', error)
      throw error
    }
  }

  private static async uploadToCloud(filename: string): Promise<void> {
    // Implement cloud upload (AWS S3, Google Cloud, etc.)
    console.log(`Uploading ${filename} to cloud storage`)
  }

  private static async downloadFromCloud(filename: string): Promise<void> {
    // Implement cloud download
    console.log(`Downloading ${filename} from cloud storage`)
  }

  private static async encryptAndUpload(filename: string, userId: string): Promise<void> {
    // Implement encryption and upload
    console.log(`Encrypting and uploading ${filename}`)
  }
}

// Automated backup cron job
export async function scheduledBackup() {
  try {
    const filename = await BackupManager.createDatabaseBackup()
    console.log(`Automated backup created: ${filename}`)

    // Clean up old backups (keep last 30 days)
    await cleanupOldBackups(30)
  } catch (error) {
    console.error('Scheduled backup failed:', error)
    // Send alert to monitoring service
  }
}

async function cleanupOldBackups(daysToKeep: number): Promise<void> {
  // Implement cleanup logic
  console.log(`Cleaning up backups older than ${daysToKeep} days`)
}

Production Checklist

Pre-deployment:

  • Environment variables configured
  • Database migrations run
  • SSL certificates configured
  • Domain DNS configured
  • CDN setup (optional)

Security:

  • Rate limiting implemented
  • Input validation added
  • Authentication configured
  • HTTPS enabled
  • CORS configured

Performance:

  • Caching implemented
  • Database queries optimized
  • Images optimized
  • CDN configured
  • Compression enabled

Monitoring:

  • Error tracking (Sentry)
  • Performance monitoring
  • Uptime monitoring
  • Log aggregation

Backup & Recovery:

  • Database backups automated
  • User data backup system
  • Recovery procedures documented
  • Failover systems configured

What We've Accomplished

🎉 Congratulations! You've successfully deployed production-ready AI applications:

  1. Vercel Deployment - Seamless deployment with environment configuration
  2. Production Database - Optimized Prisma setup with connection pooling
  3. Caching & Performance - Redis caching and rate limiting
  4. Monitoring & Observability - Sentry integration and custom monitoring
  5. Error Handling - Circuit breakers and fallback strategies
  6. Security - Rate limiting, input validation, and content filtering
  7. Scaling - Load balancing and capacity management
  8. Backup & Recovery - Automated backups and recovery procedures
  9. Health Checks - Comprehensive system monitoring
  10. Production Checklist - Complete deployment verification

Your AI Journey Continues...

You've now built and deployed complete AI-powered applications! Here's what's next:

🚀 Scale to Millions: Implement advanced caching, CDN, and microservices 🤖 Custom AI Models: Fine-tune models for your specific use cases
📊 Advanced Analytics: Track AI performance and user engagement 🔒 Enterprise Security: SSO, audit logs, and compliance features 🌍 Global Deployment: Multi-region deployment and localization ⚡ Edge Computing: Deploy AI models closer to users 🎯 A/B Testing: Test different AI models and prompts 📈 Revenue Models: Monetize your AI applications

Final Words

Remember:

  • Start Small: Begin with simple features and iterate
  • Monitor Everything: Use data to guide your decisions
  • Security First: Always prioritize user data protection
  • Performance Matters: Users expect fast, responsive AI
  • Ethics Count: Build responsible AI applications
  • Community: Join the AI developer community for support

You've accomplished something amazing! The world of AI application development is yours to explore. Keep building, keep learning, and keep creating incredible experiences with AI.

What's your next AI adventure? 🌟


Thank you for completing the Vercel AI Tutorial! Your journey into AI-powered application development has just begun. 🚀

What Problem Does This Solve?

Most teams struggle here because the hard part is not writing more code, but deciding clear boundaries for error, model, static so behavior stays predictable as complexity grows.

In practical terms, this chapter helps you avoid three common failures:

  • coupling core logic too tightly to one implementation path
  • missing the handoff boundaries between setup, execution, and validation
  • shipping changes without clear rollback or observability strategy

After working through this chapter, you should be able to reason about Chapter 8: Production Deployment as an operating subsystem inside Vercel AI SDK Tutorial: Production TypeScript AI Apps and Agents, with explicit contracts for inputs, state transitions, and outputs.

Use the implementation notes around NextResponse, userId, Promise as your checklist when adapting these patterns to your own repository.

How it Works Under the Hood

Under the hood, Chapter 8: Production Deployment usually follows a repeatable control path:

  1. Context bootstrap: initialize runtime config and prerequisites for error.
  2. Input normalization: shape incoming data so model receives stable contracts.
  3. Core execution: run the main logic branch and propagate intermediate state through static.
  4. Policy and safety checks: enforce limits, auth scopes, and failure boundaries.
  5. Output composition: return canonical result payloads for downstream consumers.
  6. Operational telemetry: emit logs/metrics needed for debugging and performance tuning.

When debugging, walk this sequence in order and confirm each stage has explicit success/failure conditions.

Source Walkthrough

Use the following upstream sources to verify implementation details while reading this chapter:

  • AI SDK Repository Why it matters: authoritative reference on AI SDK Repository (github.com).
  • AI SDK Releases Why it matters: authoritative reference on AI SDK Releases (github.com).
  • AI SDK Docs Why it matters: authoritative reference on AI SDK Docs (ai-sdk.dev).

Suggested trace strategy:

  • search upstream code for error and model to map concrete implementation paths
  • compare docs claims against actual runtime/config code before reusing patterns in production

Chapter Connections