Skip to content

Latest commit

 

History

History
772 lines (611 loc) · 22.3 KB

File metadata and controls

772 lines (611 loc) · 22.3 KB
layout default
title Chapter 8: Production Deployment
parent Perplexica Tutorial
nav_order 8

Chapter 8: Production Deployment

Welcome to Chapter 8: Production Deployment. In this part of Perplexica Tutorial: AI-Powered Search Engine, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.

Deploying Perplexica for production use requires careful planning across infrastructure, security, monitoring, and cost management. This chapter covers everything from single-command Docker deployment to multi-instance scaling with load balancing, SSL termination, API key security, observability, and cost optimization. By the end, you will have a production-ready deployment playbook.

Deployment Architecture

flowchart TD
    subgraph Internet["Internet"]
        U[Users]
        DNS[DNS - yourdomain.com]
    end

    subgraph Edge["Edge Layer"]
        CF[Cloudflare / CDN]
        SSL[SSL Termination]
    end

    subgraph Proxy["Reverse Proxy"]
        NX[Nginx / Caddy]
    end

    subgraph App["Application Tier"]
        P1[Perplexica Instance 1]
        P2[Perplexica Instance 2]
    end

    subgraph Search["Search Tier"]
        SX1[SearXNG Instance 1]
        SX2[SearXNG Instance 2]
    end

    subgraph Data["Data Tier"]
        DB[(SQLite / PostgreSQL)]
        VOL[Persistent Volume]
    end

    subgraph AI["AI Providers"]
        OA[OpenAI API]
        AN[Anthropic API]
        OL[Ollama - Local]
    end

    U --> DNS --> CF --> SSL --> NX
    NX --> P1 & P2
    P1 & P2 --> SX1 & SX2
    P1 & P2 --> DB
    DB --> VOL
    P1 & P2 --> OA & AN & OL

    classDef internet fill:#e1f5fe,stroke:#01579b
    classDef edge fill:#f3e5f5,stroke:#4a148c
    classDef proxy fill:#fff3e0,stroke:#ef6c00
    classDef app fill:#e8f5e8,stroke:#1b5e20
    classDef search fill:#e0f2f1,stroke:#004d40
    classDef data fill:#fce4ec,stroke:#b71c1c
    classDef ai fill:#f1f8e9,stroke:#33691e

    class U,DNS internet
    class CF,SSL edge
    class NX proxy
    class P1,P2 app
    class SX1,SX2 search
    class DB,VOL data
    class OA,AN,OL ai
Loading

Docker Deployment

Single-Command Quick Start

Starting with v1.11, Perplexica supports single-command Docker deployment without cloning the full repository.

# Quick start with Docker Compose (no repo clone needed)
docker run -d \
  --name perplexica \
  -p 3000:3000 \
  -p 3001:3001 \
  -v perplexica-data:/app/data \
  -v perplexica-config:/app/config \
  itzcrazykns/perplexica:latest

Full Docker Compose Setup

# docker-compose.yaml
version: '3.8'

services:
  perplexica-frontend:
    image: itzcrazykns/perplexica-frontend:latest
    build:
      context: ./ui
      dockerfile: Dockerfile
    ports:
      - "3000:3000"
    environment:
      - NEXT_PUBLIC_API_URL=http://perplexica-backend:3001
      - NEXT_PUBLIC_WS_URL=ws://perplexica-backend:3001
    depends_on:
      - perplexica-backend
    restart: unless-stopped
    networks:
      - perplexica-net

  perplexica-backend:
    image: itzcrazykns/perplexica-backend:latest
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "3001:3001"
    volumes:
      - perplexica-data:/app/data
      - ./config.toml:/app/config.toml
    environment:
      - NODE_ENV=production
      - PORT=3001
    depends_on:
      - searxng
    restart: unless-stopped
    networks:
      - perplexica-net

  searxng:
    image: searxng/searxng:latest
    container_name: perplexica-searxng
    ports:
      - "32768:8080"
    volumes:
      - ./searxng:/etc/searxng
    environment:
      - SEARXNG_BASE_URL=http://localhost:32768/
    restart: unless-stopped
    networks:
      - perplexica-net

volumes:
  perplexica-data:
    driver: local

networks:
  perplexica-net:
    driver: bridge

Production Dockerfile

# Dockerfile (backend)
FROM node:20-alpine AS builder

WORKDIR /app

# Install dependencies
COPY package.json yarn.lock ./
RUN yarn install --frozen-lockfile --production=false

# Build application
COPY . .
RUN yarn build

# Production image
FROM node:20-alpine AS runner

WORKDIR /app

# Copy built application
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/package.json ./

# Create data directory
RUN mkdir -p /app/data

# Non-root user for security
RUN addgroup --system --gid 1001 perplexica && \
    adduser --system --uid 1001 perplexica && \
    chown -R perplexica:perplexica /app

USER perplexica

EXPOSE 3001

HEALTHCHECK --interval=30s --timeout=5s --start-period=10s \
  CMD wget --no-verbose --tries=1 --spider http://localhost:3001/api/health || exit 1

CMD ["node", "dist/index.js"]

Reverse Proxy Configuration

Nginx Configuration

# /etc/nginx/sites-available/perplexica.conf
upstream perplexica_backend {
    server 127.0.0.1:3001;
    # Add more instances for load balancing:
    # server 127.0.0.1:3002;
    # server 127.0.0.1:3003;
}

upstream perplexica_frontend {
    server 127.0.0.1:3000;
}

server {
    listen 80;
    server_name search.yourdomain.com;
    return 301 https://$server_name$request_uri;
}

server {
    listen 443 ssl http2;
    server_name search.yourdomain.com;

    ssl_certificate /etc/letsencrypt/live/search.yourdomain.com/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/search.yourdomain.com/privkey.pem;

    # SSL hardening
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256;
    ssl_prefer_server_ciphers off;
    ssl_session_cache shared:SSL:10m;

    # Security headers
    add_header X-Frame-Options "SAMEORIGIN" always;
    add_header X-Content-Type-Options "nosniff" always;
    add_header X-XSS-Protection "1; mode=block" always;
    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;

    # Frontend
    location / {
        proxy_pass http://perplexica_frontend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }

    # Backend API
    location /api/ {
        proxy_pass http://perplexica_backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

        # Rate limiting
        limit_req zone=api burst=20 nodelay;
    }

    # WebSocket
    location /ws {
        proxy_pass http://perplexica_backend;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_read_timeout 86400;
    }
}

# Rate limiting zone
limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s;

Caddy Configuration (Alternative)

# Caddyfile
search.yourdomain.com {
    # Automatic HTTPS via Let's Encrypt

    # Frontend
    handle {
        reverse_proxy localhost:3000
    }

    # Backend API
    handle /api/* {
        reverse_proxy localhost:3001
    }

    # WebSocket
    handle /ws {
        reverse_proxy localhost:3001
    }

    # Security headers
    header {
        X-Frame-Options "SAMEORIGIN"
        X-Content-Type-Options "nosniff"
        Strict-Transport-Security "max-age=31536000; includeSubDomains"
    }
}

Security Considerations

API Key Protection

flowchart TD
    subgraph Storage["Key Storage"]
        CT[config.toml - file permissions 600]
        EV[Environment Variables]
        VS[Vault / Secrets Manager]
    end

    subgraph Access["Access Control"]
        RD[Redact in API responses]
        NL[Never log API keys]
        RT[Key rotation schedule]
    end

    subgraph Network["Network Security"]
        SSL[TLS 1.2+ only]
        FW[Firewall rules]
        RL[Rate limiting]
        CO[CORS policy]
    end

    Storage --> Access --> Network

    classDef storage fill:#fce4ec,stroke:#b71c1c
    classDef access fill:#fff3e0,stroke:#ef6c00
    classDef network fill:#e8f5e8,stroke:#1b5e20

    class CT,EV,VS storage
    class RD,NL,RT access
    class SSL,FW,RL,CO network
Loading

Security Hardening Checklist

Area Action Priority
API Keys Store in config.toml with 600 permissions, never commit to git Critical
HTTPS Enforce TLS 1.2+ with valid certificate Critical
Rate Limiting 10 req/s per IP on API, 2 req/s per IP on search High
CORS Restrict to your domain only High
File Uploads Validate file type, enforce 10MB limit, scan for malware High
Headers X-Frame-Options, CSP, HSTS Medium
Authentication Add user auth if exposed to internet (not built-in) Medium
Logging Audit log API access, redact sensitive data Medium
Updates Automated security updates for Docker images Medium
Backup Daily database backup with encryption Medium

Adding Basic Authentication

Perplexica does not include built-in authentication, but you can add it at the reverse proxy layer.

# Generate htpasswd file
sudo apt install apache2-utils
htpasswd -c /etc/nginx/.htpasswd admin
# Add to nginx location blocks
location / {
    auth_basic "Perplexica";
    auth_basic_user_file /etc/nginx/.htpasswd;

    proxy_pass http://perplexica_frontend;
    # ... other proxy settings
}

Monitoring and Observability

Health Check Endpoint

// src/routes/health.ts
import { Router } from 'express';

const router = Router();

router.get('/health', async (_req, res) => {
  const checks: Record<string, 'ok' | 'error'> = {};

  // Check database
  try {
    await db.select().from(chats).limit(1);
    checks.database = 'ok';
  } catch {
    checks.database = 'error';
  }

  // Check SearXNG
  try {
    const response = await fetch(`${getSearXNGUrl()}/healthz`, {
      signal: AbortSignal.timeout(5000),
    });
    checks.searxng = response.ok ? 'ok' : 'error';
  } catch {
    checks.searxng = 'error';
  }

  // Check at least one AI provider
  try {
    const providers = await getAvailableChatModelProviders();
    const hasProvider = Object.values(providers).some(
      (p) => Object.keys(p).length > 0,
    );
    checks.aiProviders = hasProvider ? 'ok' : 'error';
  } catch {
    checks.aiProviders = 'error';
  }

  const allHealthy = Object.values(checks).every((c) => c === 'ok');

  res.status(allHealthy ? 200 : 503).json({
    status: allHealthy ? 'healthy' : 'degraded',
    checks,
    uptime: process.uptime(),
    timestamp: new Date().toISOString(),
  });
});

export default router;

Prometheus Metrics

// src/lib/metrics.ts
import { Counter, Histogram, Gauge, register } from 'prom-client';

// Search metrics
export const searchRequestsTotal = new Counter({
  name: 'perplexica_search_requests_total',
  help: 'Total number of search requests',
  labelNames: ['focus_mode', 'status'],
});

export const searchLatency = new Histogram({
  name: 'perplexica_search_latency_seconds',
  help: 'Search request latency in seconds',
  labelNames: ['focus_mode'],
  buckets: [0.5, 1, 2, 5, 10, 30],
});

export const llmTokensUsed = new Counter({
  name: 'perplexica_llm_tokens_total',
  help: 'Total LLM tokens consumed',
  labelNames: ['provider', 'model', 'type'],
});

export const activeWebSockets = new Gauge({
  name: 'perplexica_active_websockets',
  help: 'Number of active WebSocket connections',
});

export const searxngLatency = new Histogram({
  name: 'perplexica_searxng_latency_seconds',
  help: 'SearXNG search latency',
  buckets: [0.1, 0.25, 0.5, 1, 2, 5],
});

// Metrics endpoint
export function getMetrics() {
  return register.metrics();
}

Monitoring Dashboard Metrics

Metric Description Alert Threshold
Search latency (p95) 95th percentile search response time > 10 seconds
Error rate Percentage of failed search requests > 5%
Active WebSockets Current concurrent connections > 100 per instance
LLM tokens/hour AI provider token consumption rate > budget limit
SearXNG latency Meta-search response time > 3 seconds
Database size SQLite file size on disk > 1 GB
Memory usage Container memory consumption > 80% of limit
CPU usage Container CPU utilization > 70% sustained

Scaling Strategy

Horizontal Scaling with Docker Compose

# docker-compose.prod.yaml
version: '3.8'

services:
  perplexica-backend:
    image: itzcrazykns/perplexica-backend:latest
    deploy:
      replicas: 3
      resources:
        limits:
          cpus: '2.0'
          memory: 2G
        reservations:
          cpus: '0.5'
          memory: 512M
    environment:
      - NODE_ENV=production
      - PORT=3001
    volumes:
      - shared-data:/app/data
      - ./config.toml:/app/config.toml:ro
    networks:
      - perplexica-net

  searxng:
    image: searxng/searxng:latest
    deploy:
      replicas: 2
      resources:
        limits:
          cpus: '1.0'
          memory: 1G
    volumes:
      - ./searxng:/etc/searxng:ro
    networks:
      - perplexica-net

volumes:
  shared-data:
    driver: local

networks:
  perplexica-net:
    driver: bridge

Cloud Deployment Options

Platform Service Advantages Estimated Cost
AWS ECS Fargate Auto-scaling, no server management $50-200/month
Google Cloud Cloud Run Scale to zero, pay-per-request $30-150/month
DigitalOcean App Platform Simple deployment, predictable pricing $20-100/month
Hetzner VPS + Docker Best price/performance for EU $10-50/month
Self-hosted Docker Compose Full control, no recurring cloud cost Hardware cost only

Cost Optimization

AI Provider Cost Management

flowchart LR
    subgraph Query["Incoming Query"]
        Q[User Query]
    end

    subgraph Router["Cost-Aware Router"]
        CX{Query Complexity?}
    end

    subgraph Models["Model Selection"]
        CHEAP[Groq / Haiku - $0.001/query]
        MID[GPT-4o Mini / Sonnet - $0.01/query]
        FULL[GPT-4o / Opus - $0.05/query]
        LOCAL[Ollama - Free]
    end

    Q --> CX
    CX -->|Simple factual| CHEAP
    CX -->|Moderate analysis| MID
    CX -->|Complex research| FULL
    CX -->|Development/testing| LOCAL

    classDef query fill:#e1f5fe,stroke:#01579b
    classDef router fill:#fff3e0,stroke:#ef6c00
    classDef model fill:#e8f5e8,stroke:#1b5e20

    class Q query
    class CX router
    class CHEAP,MID,FULL,LOCAL model
Loading

Cost Optimization Strategies

Strategy Implementation Monthly Savings
Use cheap models for query reformulation Route reformulation step to Haiku/Mini 60-80% on reformulation
Cache search results (15-min TTL) In-memory cache by query hash 30-50% on SearXNG load
Cache LLM responses (1-hour TTL) Redis/memory cache for repeated queries 20-40% on LLM costs
Rate limit per user Nginx rate limiting + token budgets Prevents runaway costs
Use Ollama for dev/staging Local models for non-production 100% savings in dev
Batch embedding requests Group documents before embedding call 20-30% on embeddings
Monitor and alert on spend Track tokens/hour, set budget alerts Avoids surprise bills

Budget Monitoring

// src/lib/budget.ts
interface BudgetConfig {
  dailyLimitUSD: number;
  alertThresholdPercent: number;
}

class BudgetMonitor {
  private dailySpend: number = 0;
  private lastReset: Date = new Date();
  private config: BudgetConfig;

  constructor(config: BudgetConfig) {
    this.config = config;
  }

  trackUsage(tokens: number, costPerToken: number): void {
    this.resetIfNewDay();
    this.dailySpend += tokens * costPerToken;

    const percentUsed = (this.dailySpend / this.config.dailyLimitUSD) * 100;

    if (percentUsed >= this.config.alertThresholdPercent) {
      console.warn(
        `Budget alert: ${percentUsed.toFixed(1)}% of daily limit used ($${this.dailySpend.toFixed(4)} / $${this.config.dailyLimitUSD})`,
      );
    }
  }

  canProceed(): boolean {
    this.resetIfNewDay();
    return this.dailySpend < this.config.dailyLimitUSD;
  }

  private resetIfNewDay(): void {
    const now = new Date();
    if (now.toDateString() !== this.lastReset.toDateString()) {
      this.dailySpend = 0;
      this.lastReset = now;
    }
  }
}

Backup and Recovery

Database Backup Script

#!/bin/bash
# backup.sh - Run daily via cron

BACKUP_DIR="/backups/perplexica"
DB_PATH="/app/data/perplexica.db"
DATE=$(date +%Y%m%d_%H%M%S)

# Create backup directory
mkdir -p "$BACKUP_DIR"

# SQLite online backup (safe for concurrent access)
sqlite3 "$DB_PATH" ".backup '$BACKUP_DIR/perplexica_$DATE.db'"

# Compress
gzip "$BACKUP_DIR/perplexica_$DATE.db"

# Retain last 30 days
find "$BACKUP_DIR" -name "*.gz" -mtime +30 -delete

echo "Backup completed: perplexica_$DATE.db.gz"
# Add to crontab
# crontab -e
0 2 * * * /app/scripts/backup.sh >> /var/log/perplexica-backup.log 2>&1

Production Readiness Checklist

Category Item Status
Infrastructure Docker containers configured with resource limits Required
Infrastructure Persistent volume for SQLite database Required
Infrastructure Reverse proxy with SSL termination Required
Infrastructure Health check endpoint configured Required
Security API keys stored securely (file permissions 600) Required
Security HTTPS enforced with valid certificate Required
Security Rate limiting configured Required
Security Security headers set Recommended
Security Authentication layer for public deployments Recommended
Monitoring Health check monitoring Required
Monitoring Log aggregation Recommended
Monitoring Performance metrics (Prometheus) Recommended
Monitoring Budget alerts for AI provider costs Recommended
Reliability Automated database backups Required
Reliability Container auto-restart policy Required
Reliability Graceful shutdown handling Recommended
Performance Response caching configured Recommended
Performance SearXNG replicas for redundancy Recommended
Performance Database WAL mode enabled Recommended

Summary

Deploying Perplexica in production is a matter of wrapping the Docker containers with the right operational infrastructure: a reverse proxy for SSL and load balancing, persistent storage for the database, security hardening for API keys and network access, monitoring for health and performance, and cost controls for AI provider spending. The single-command Docker deployment makes getting started easy, while the modular architecture allows you to scale each tier independently as usage grows.

Key Takeaways

  • Docker-first deployment -- Perplexica ships with Docker Compose files that handle the frontend, backend, and SearXNG in a single command.
  • Reverse proxy is essential -- Nginx or Caddy provides SSL termination, WebSocket proxying, rate limiting, and load balancing.
  • Security is layered -- file permissions for API keys, HTTPS everywhere, rate limiting, security headers, and optional authentication at the proxy layer.
  • Monitoring drives reliability -- health checks, Prometheus metrics, and budget alerts keep the system observable and costs predictable.
  • Cost-aware model routing -- sending simple queries to cheap/fast models and reserving expensive models for complex research queries can reduce AI costs by 60-80%.
  • Backup is non-negotiable -- daily SQLite backups with 30-day retention protect against data loss.

Congratulations

You have completed the Perplexica tutorial. You now have the knowledge to install, configure, extend, and deploy an AI-powered search engine. Whether you are building a personal research assistant, a team knowledge tool, or a customer-facing search product, Perplexica's open-source architecture gives you full control over every layer of the stack.

Go back to the Tutorial Index to review any chapter.


Built with insights from the Perplexica project.

What Problem Does This Solve?

Most teams struggle here because the hard part is not writing more code, but deciding clear boundaries for perplexica, config, searxng so behavior stays predictable as complexity grows.

In practical terms, this chapter helps you avoid three common failures:

  • coupling core logic too tightly to one implementation path
  • missing the handoff boundaries between setup, execution, and validation
  • shipping changes without clear rollback or observability strategy

After working through this chapter, you should be able to reason about Chapter 8: Production Deployment as an operating subsystem inside Perplexica Tutorial: AI-Powered Search Engine, with explicit contracts for inputs, state transitions, and outputs.

Use the implementation notes around subgraph, classDef, fill as your checklist when adapting these patterns to your own repository.

How it Works Under the Hood

Under the hood, Chapter 8: Production Deployment usually follows a repeatable control path:

  1. Context bootstrap: initialize runtime config and prerequisites for perplexica.
  2. Input normalization: shape incoming data so config receives stable contracts.
  3. Core execution: run the main logic branch and propagate intermediate state through searxng.
  4. Policy and safety checks: enforce limits, auth scopes, and failure boundaries.
  5. Output composition: return canonical result payloads for downstream consumers.
  6. Operational telemetry: emit logs/metrics needed for debugging and performance tuning.

When debugging, walk this sequence in order and confirm each stage has explicit success/failure conditions.

Source Walkthrough

Use the following upstream sources to verify implementation details while reading this chapter:

  • View Repo Why it matters: authoritative reference on View Repo (github.com).

Suggested trace strategy:

  • search upstream code for perplexica and config to map concrete implementation paths
  • compare docs claims against actual runtime/config code before reusing patterns in production

Chapter Connections