Skip to content

Latest commit

 

History

History
666 lines (544 loc) · 16.2 KB

File metadata and controls

666 lines (544 loc) · 16.2 KB
layout default
title Chapter 8: Production Deployment
parent Botpress Tutorial
nav_order 8

Chapter 8: Production Deployment

Welcome to Chapter 8: Production Deployment. In this part of Botpress Tutorial: Open Source Conversational AI Platform, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.

This final chapter covers deploying Botpress bots to production environments, including scaling, security, monitoring, and maintenance strategies.

🚀 Production Architecture

Multi-Environment Setup

// Environment configuration
const environments = {
  development: {
    database: 'postgresql://localhost:5432/botpress_dev',
    redis: 'redis://localhost:6379',
    nlu: { confidence: 0.3 },
    logging: { level: 'debug' }
  },
  staging: {
    database: 'postgresql://staging-db:5432/botpress_staging',
    redis: 'redis://staging-redis:6379',
    nlu: { confidence: 0.7 },
    logging: { level: 'info' }
  },
  production: {
    database: 'postgresql://prod-db:5432/botpress_prod',
    redis: 'redis://prod-redis:6379',
    nlu: { confidence: 0.8 },
    logging: { level: 'warn' }
  }
}

const getEnvironmentConfig = (env = process.env.NODE_ENV || 'development') => {
  return environments[env] || environments.development
}

Docker Production Setup

# docker-compose.prod.yml
version: '3.8'
services:
  botpress:
    image: botpress/server:latest
    container_name: botpress_prod
    restart: unless-stopped
    ports:
      - "3000:3000"
    environment:
      - DATABASE_URL=${DATABASE_URL}
      - REDIS_URL=${REDIS_URL}
      - BP_MODULE_ANALYTICS=false
      - CLUSTER_ENABLED=true
      - BPFS_STORAGE=database
    volumes:
      - ./data:/botpress/data
      - ./media:/botpress/media
    depends_on:
      - postgres
      - redis
    networks:
      - botpress_network

  postgres:
    image: postgres:13
    environment:
      - POSTGRES_DB=botpress
      - POSTGRES_USER=botpress
      - POSTGRES_PASSWORD=${DB_PASSWORD}
    volumes:
      - postgres_data:/var/lib/postgresql/data
    networks:
      - botpress_network

  redis:
    image: redis:6-alpine
    volumes:
      - redis_data:/data
    networks:
      - botpress_network

networks:
  botpress_network:
    driver: bridge

volumes:
  postgres_data:
  redis_data:

🔒 Security Hardening

Environment Variables

# .env.production
DATABASE_URL=postgresql://user:password@prod-db:5432/botpress_prod
REDIS_URL=redis://prod-redis:6379
JWT_SECRET=your-super-secure-jwt-secret
ENCRYPTION_KEY=your-256-bit-encryption-key
ADMIN_PASSWORD=secure-admin-password

# Channel secrets
FACEBOOK_APP_SECRET=your-facebook-app-secret
SLACK_SIGNING_SECRET=your-slack-signing-secret
TWILIO_AUTH_TOKEN=your-twilio-auth-token

# Monitoring
SENTRY_DSN=your-sentry-dsn
LOGGING_SERVICE_URL=https://logs.example.com

API Security

// Security middleware for production
const securityMiddleware = [
  // Rate limiting
  rateLimit({
    windowMs: 15 * 60 * 1000, // 15 minutes
    max: 100, // limit each IP to 100 requests per windowMs
    message: 'Too many requests from this IP'
  }),

  // CORS configuration
  cors({
    origin: process.env.ALLOWED_ORIGINS?.split(',') || ['https://yourdomain.com'],
    credentials: true
  }),

  // Helmet for security headers
  helmet({
    contentSecurityPolicy: {
      directives: {
        defaultSrc: ["'self'"],
        styleSrc: ["'self'", "'unsafe-inline'"],
        scriptSrc: ["'self'"],
        imgSrc: ["'self'", "data:", "https:"]
      }
    }
  }),

  // Input sanitization
  express.json({ limit: '10mb' }),
  (req, res, next) => {
    // Sanitize request body
    if (req.body) {
      req.body = sanitizeInput(req.body)
    }
    next()
  }
]

Database Security

-- Create restricted database user for Botpress
CREATE USER botpress_user WITH PASSWORD 'secure_password';
GRANT CONNECT ON DATABASE botpress_prod TO botpress_user;
GRANT USAGE ON SCHEMA public TO botpress_user;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO botpress_user;
GRANT USAGE, SELECT ON ALL SEQUENCES IN SCHEMA public TO botpress_user;

-- Enable row-level security if needed
ALTER TABLE conversations ENABLE ROW LEVEL SECURITY;
CREATE POLICY user_conversations ON conversations
  FOR ALL USING (user_id = current_user_id());

📊 Monitoring & Observability

Application Monitoring

// Production monitoring setup
const monitoringSetup = {
  sentry: {
    dsn: process.env.SENTRY_DSN,
    tracesSampleRate: 0.1,
    environment: 'production'
  },

  metrics: {
    collectDefaultMetrics: true,
    prefix: 'botpress_'
  },

  logging: {
    level: 'warn',
    format: 'json',
    destination: process.env.LOGGING_SERVICE_URL
  }
}

// Initialize monitoring
const initMonitoring = () => {
  // Sentry for error tracking
  Sentry.init(monitoringSetup.sentry)

  // Prometheus metrics
  const register = new promClient.Registry()
  register.setDefaultLabels({ app: 'botpress' })
  promClient.collectDefaultMetrics({ register })

  // Winston for logging
  const logger = winston.createLogger({
    level: monitoringSetup.logging.level,
    format: winston.format.json(),
    transports: [
      new winston.transports.Console(),
      new winston.transports.Http({
        host: monitoringSetup.logging.destination,
        path: '/logs'
      })
    ]
  })

  return { register, logger }
}

Health Checks

// Comprehensive health checks
app.get('/health', async (req, res) => {
  const health = {
    status: 'ok',
    timestamp: new Date().toISOString(),
    uptime: process.uptime(),
    checks: {}
  }

  try {
    // Database health
    await db.query('SELECT 1')
    health.checks.database = { status: 'ok' }
  } catch (error) {
    health.checks.database = { status: 'error', error: error.message }
    health.status = 'error'
  }

  try {
    // Redis health
    await redis.ping()
    health.checks.redis = { status: 'ok' }
  } catch (error) {
    health.checks.redis = { status: 'error', error: error.message }
    health.status = 'error'
  }

  try {
    // NLU health
    const result = await nlu.predict('hello')
    health.checks.nlu = { status: 'ok', confidence: result.confidence }
  } catch (error) {
    health.checks.nlu = { status: 'error', error: error.message }
    health.status = 'error'
  }

  res.status(health.status === 'ok' ? 200 : 503).json(health)
})

🚀 Scaling Strategies

Horizontal Scaling

# Scaled deployment with load balancer
version: '3.8'
services:
  botpress-1:
    image: botpress/server:latest
    environment:
      - DATABASE_URL=${DATABASE_URL}
      - REDIS_URL=${REDIS_URL}
    deploy:
      replicas: 3
    networks:
      - botpress_network

  botpress-2:
    image: botpress/server:latest
    environment:
      - DATABASE_URL=${DATABASE_URL}
      - REDIS_URL=${REDIS_URL}
    deploy:
      replicas: 3
    networks:
      - botpress_network

  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
      - ./ssl:/etc/ssl/certs
    depends_on:
      - botpress-1
      - botpress-2
    networks:
      - botpress_network

Database Scaling

// Connection pooling for database scaling
const dbConfig = {
  host: process.env.DB_HOST,
  port: process.env.DB_PORT,
  database: process.env.DB_NAME,
  user: process.env.DB_USER,
  password: process.env.DB_PASSWORD,
  max: 20,              // Maximum connections
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000,
  ssl: process.env.NODE_ENV === 'production'
}

const pool = new Pool(dbConfig)

// Connection monitoring
pool.on('connect', (client) => {
  console.log('New database connection established')
})

pool.on('error', (err, client) => {
  console.error('Unexpected error on idle client', err)
})

💾 Backup & Recovery

Automated Backups

#!/bin/bash
# backup.sh

BACKUP_DIR="/var/backups/botpress"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_NAME="botpress_$TIMESTAMP"

# Create backup directory
mkdir -p $BACKUP_DIR

# Database backup
pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME > $BACKUP_DIR/${BACKUP_NAME}_db.sql

# File system backup
tar -czf $BACKUP_DIR/${BACKUP_NAME}_files.tar.gz /var/lib/botpress/data

# Upload to cloud storage
aws s3 cp $BACKUP_DIR/${BACKUP_NAME}_db.sql s3://botpress-backups/
aws s3 cp $BACKUP_DIR/${BACKUP_NAME}_files.tar.gz s3://botpress-backups/

# Clean old backups (keep last 30 days)
find $BACKUP_DIR -name "*.sql" -mtime +30 -delete
find $BACKUP_DIR -name "*.tar.gz" -mtime +30 -delete

Recovery Procedure

#!/bin/bash
# restore.sh

BACKUP_NAME=$1

if [ -z "$BACKUP_NAME" ]; then
  echo "Usage: $0 <backup_name>"
  exit 1
fi

# Stop Botpress
docker-compose down

# Restore database
psql -h $DB_HOST -U $DB_USER -d $DB_NAME < /var/backups/botpress/${BACKUP_NAME}_db.sql

# Restore files
tar -xzf /var/backups/botpress/${BACKUP_NAME}_files.tar.gz -C /var/lib/botpress/

# Start Botpress
docker-compose up -d

echo "Recovery completed"

🔄 Deployment Pipeline

CI/CD Setup

# .github/workflows/deploy.yml
name: Deploy to Production

on:
  push:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Setup Node.js
        uses: actions/setup-node@v2
        with:
          node-version: '16'
      - name: Install dependencies
        run: npm ci
      - name: Run tests
        run: npm test
      - name: Build
        run: npm run build

  deploy:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - name: Deploy to production
        run: |
          echo "Deploying to production..."
          # Add your deployment commands here

Blue-Green Deployment

// Blue-green deployment strategy
class BlueGreenDeployment {
  constructor() {
    this.active = 'blue'
    this.inactive = 'green'
  }

  async deploy(newVersion) {
    try {
      // Deploy to inactive environment
      await this.deployToEnvironment(this.inactive, newVersion)

      // Run health checks
      await this.healthCheck(this.inactive)

      // Switch traffic
      await this.switchTraffic()

      // Keep old version for rollback
      this.previousVersion = this.active

    } catch (error) {
      console.error('Deployment failed:', error)
      await this.rollback()
    }
  }

  async rollback() {
    if (this.previousVersion) {
      await this.switchTraffic(this.previousVersion)
      console.log('Rolled back to previous version')
    }
  }
}

📊 Performance Optimization

Caching Strategy

// Multi-level caching for production
const cacheConfig = {
  memory: {
    ttl: 300,        // 5 minutes
    max: 1000       // Maximum items
  },
  redis: {
    ttl: 3600,      // 1 hour
    prefix: 'botpress:'
  },
  cdn: {
    ttl: 86400,     // 1 day
    provider: 'cloudflare'
  }
}

const multiLevelCache = {
  async get(key) {
    // Try memory cache first
    let value = await memoryCache.get(key)
    if (value) return value

    // Try Redis
    value = await redisCache.get(key)
    if (value) {
      // Populate memory cache
      await memoryCache.set(key, value, cacheConfig.memory.ttl)
      return value
    }

    return null
  },

  async set(key, value, ttl = cacheConfig.redis.ttl) {
    await Promise.all([
      memoryCache.set(key, value, cacheConfig.memory.ttl),
      redisCache.set(key, value, ttl)
    ])
  }
}

🚨 Incident Response

Alert Configuration

// Production alerting
const alertConfig = {
  channels: {
    slack: {
      webhook: process.env.SLACK_WEBHOOK,
      channel: '#alerts'
    },
    email: {
      smtp: process.env.SMTP_CONFIG,
      recipients: ['team@company.com']
    }
  },

  thresholds: {
    errorRate: 0.05,      // 5% error rate
    responseTime: 5000,   // 5 second response time
    memoryUsage: 0.9      // 90% memory usage
  }
}

const alertManager = {
  async sendAlert(type, message, details) {
    const promises = []

    if (alertConfig.channels.slack) {
      promises.push(this.sendSlackAlert(type, message, details))
    }

    if (alertConfig.channels.email) {
      promises.push(this.sendEmailAlert(type, message, details))
    }

    await Promise.all(promises)
  },

  async checkThresholds(metrics) {
    if (metrics.errorRate > alertConfig.thresholds.errorRate) {
      await this.sendAlert('error', 'High error rate detected', metrics)
    }

    if (metrics.avgResponseTime > alertConfig.thresholds.responseTime) {
      await this.sendAlert('warning', 'High response time detected', metrics)
    }
  }
}

📝 Production Checklist

# Pre-deployment checklist
PRODUCTION_CHECKLIST="
☐ Environment variables configured
☐ Database connection tested
☐ Redis connection established
☐ SSL certificates installed
☐ Firewall rules configured
☐ Monitoring tools set up
☐ Backup strategy implemented
☐ Health checks working
☐ Load balancer configured
☐ CDN set up for static assets
☐ Alert system tested
☐ Rollback plan documented
☐ Team notified of deployment
"

📝 Chapter Summary

  • ✅ Configured production-ready architecture
  • ✅ Implemented security hardening measures
  • ✅ Set up comprehensive monitoring
  • ✅ Created scaling strategies
  • ✅ Established backup and recovery procedures
  • ✅ Built deployment pipeline
  • ✅ Configured incident response system

Key Takeaways:

  • Production requires careful security configuration
  • Monitoring is critical for maintaining reliability
  • Scaling should be planned for growth
  • Backups and recovery are essential for business continuity
  • CI/CD enables reliable deployments
  • Incident response minimizes downtime
  • Performance optimization ensures user satisfaction

What Problem Does This Solve?

Most teams struggle here because the hard part is not writing more code, but deciding clear boundaries for error, botpress, redis so behavior stays predictable as complexity grows.

In practical terms, this chapter helps you avoid three common failures:

  • coupling core logic too tightly to one implementation path
  • missing the handoff boundaries between setup, execution, and validation
  • shipping changes without clear rollback or observability strategy

After working through this chapter, you should be able to reason about Chapter 8: Production Deployment as an operating subsystem inside Botpress Tutorial: Open Source Conversational AI Platform, with explicit contracts for inputs, state transitions, and outputs.

Use the implementation notes around health, process, status as your checklist when adapting these patterns to your own repository.

How it Works Under the Hood

Under the hood, Chapter 8: Production Deployment usually follows a repeatable control path:

  1. Context bootstrap: initialize runtime config and prerequisites for error.
  2. Input normalization: shape incoming data so botpress receives stable contracts.
  3. Core execution: run the main logic branch and propagate intermediate state through redis.
  4. Policy and safety checks: enforce limits, auth scopes, and failure boundaries.
  5. Output composition: return canonical result payloads for downstream consumers.
  6. Operational telemetry: emit logs/metrics needed for debugging and performance tuning.

When debugging, walk this sequence in order and confirm each stage has explicit success/failure conditions.

Source Walkthrough

Use the following upstream sources to verify implementation details while reading this chapter:

Suggested trace strategy:

  • search upstream code for error and botpress to map concrete implementation paths
  • compare docs claims against actual runtime/config code before reusing patterns in production

Chapter Connections