| layout | default |
|---|---|
| title | Chapter 8: Production Deployment |
| parent | Firecrawl Tutorial |
| nav_order | 8 |
Welcome to Chapter 8: Production Deployment. In this part of Firecrawl Tutorial: Building LLM-Ready Web Scraping and Data Extraction Systems, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.
You have built a scraping and RAG pipeline that works on your laptop. Now it is time to deploy it to production -- where it must run 24/7 with high reliability, proper security, automated recovery, and full observability. This chapter covers everything from containerization to Kubernetes deployment, from secrets management to disaster recovery.
Production deployment is where the rubber meets the road. A system that works in development but fails in production provides zero value. This chapter ensures your Firecrawl-based pipeline survives the real world.
| Skill | Description |
|---|---|
| Containerization | Package workers and services in Docker containers |
| Orchestration | Deploy with Kubernetes and autoscaling |
| Secrets management | Secure API keys and credentials |
| CI/CD pipelines | Automate testing, building, and deployment |
| Observability | Monitor logs, metrics, and traces in production |
| Backup and recovery | Protect data and recover from failures |
| Operational runbooks | Respond to incidents systematically |
A production deployment separates concerns into independently deployable services, each with its own scaling policy and failure domain.
flowchart TD
subgraph Ingress["Ingress Layer"]
A[Load Balancer] --> B[API Gateway]
end
subgraph App["Application Layer"]
B --> C[Scheduler Service]
C --> D[(Redis Queue)]
D --> E[Worker Pool]
E --> F[Firecrawl API]
end
subgraph Data["Data Layer"]
E --> G[(PostgreSQL)]
E --> H[(Redis Cache)]
E --> I[(Vector DB)]
end
subgraph Ops["Operations"]
J[Prometheus] --> K[Grafana]
L[Loki] --> K
E --> J
E --> L
M[Alertmanager] --> N[PagerDuty/Slack]
J --> M
end
classDef ingress fill:#e1f5fe,stroke:#01579b
classDef app fill:#f3e5f5,stroke:#4a148c
classDef data fill:#e8f5e8,stroke:#1b5e20
classDef ops fill:#fff3e0,stroke:#e65100
class A,B ingress
class C,D,E,F app
class G,H,I data
class J,K,L,M,N ops
# Multi-stage build for smaller image
FROM python:3.11-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --user -r requirements.txt
FROM python:3.11-slim
# Create non-root user
RUN useradd --create-home --shell /bin/bash worker
WORKDIR /app
# Copy installed packages from builder
COPY --from=builder /root/.local /home/worker/.local
ENV PATH=/home/worker/.local/bin:$PATH
# Copy application code
COPY src/ ./src/
COPY config/ ./config/
# Switch to non-root user
USER worker
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD python -c "import requests; requests.get('http://localhost:8080/health')" || exit 1
# Run the worker
CMD ["python", "src/worker.py"]# docker-compose.yml
version: "3.8"
services:
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
- redis_data:/data
postgres:
image: postgres:16-alpine
environment:
POSTGRES_DB: firecrawl_pipeline
POSTGRES_USER: pipeline
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
ports:
- "5432:5432"
volumes:
- pg_data:/var/lib/postgresql/data
worker:
build:
context: .
dockerfile: Dockerfile
environment:
FIRECRAWL_API_KEY: ${FIRECRAWL_API_KEY}
REDIS_URL: redis://redis:6379
DATABASE_URL: postgresql://pipeline:${POSTGRES_PASSWORD}@postgres:5432/firecrawl_pipeline
WORKER_CONCURRENCY: "5"
LOG_LEVEL: info
depends_on:
- redis
- postgres
deploy:
replicas: 3
restart: unless-stopped
scheduler:
build:
context: .
dockerfile: Dockerfile
command: python src/scheduler.py
environment:
REDIS_URL: redis://redis:6379
DATABASE_URL: postgresql://pipeline:${POSTGRES_PASSWORD}@postgres:5432/firecrawl_pipeline
depends_on:
- redis
- postgres
restart: unless-stopped
volumes:
redis_data:
pg_data:# src/worker.py
import os
import signal
import sys
import logging
from firecrawl import FirecrawlApp
logging.basicConfig(
level=os.getenv("LOG_LEVEL", "INFO").upper(),
format="%(asctime)s %(levelname)s %(name)s %(message)s",
)
logger = logging.getLogger("worker")
# Graceful shutdown
shutdown_requested = False
def handle_shutdown(signum, frame):
global shutdown_requested
logger.info("Shutdown signal received, finishing current job...")
shutdown_requested = True
signal.signal(signal.SIGTERM, handle_shutdown)
signal.signal(signal.SIGINT, handle_shutdown)
def main():
# Load configuration from environment
api_key = os.environ["FIRECRAWL_API_KEY"]
redis_url = os.environ["REDIS_URL"]
concurrency = int(os.getenv("WORKER_CONCURRENCY", "5"))
app = FirecrawlApp(api_key=api_key)
queue = ScrapeJobQueue(redis_url=redis_url)
logger.info(f"Worker started with concurrency={concurrency}")
while not shutdown_requested:
job = queue.dequeue()
if job is None:
import time
time.sleep(1)
continue
url = job["url"]
logger.info(f"Processing: {url}")
try:
result = app.scrape_url(url, params={
"formats": ["markdown"],
"onlyMainContent": True,
"timeout": 30000,
})
queue.complete(job, {
"content_length": len(result.get("markdown", "")),
"title": result.get("metadata", {}).get("title", ""),
})
logger.info(f"Completed: {url}")
except Exception as exc:
queue.fail(job, str(exc))
logger.error(f"Failed: {url}: {exc}")
logger.info("Worker shutdown complete")
sys.exit(0)
if __name__ == "__main__":
main()# k8s/worker-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: scrape-worker
labels:
app: scrape-worker
spec:
replicas: 3
selector:
matchLabels:
app: scrape-worker
template:
metadata:
labels:
app: scrape-worker
spec:
containers:
- name: worker
image: your-registry/scrape-worker:latest
resources:
requests:
cpu: "250m"
memory: "512Mi"
limits:
cpu: "1000m"
memory: "1Gi"
env:
- name: FIRECRAWL_API_KEY
valueFrom:
secretKeyRef:
name: firecrawl-secrets
key: api-key
- name: REDIS_URL
valueFrom:
configMapKeyRef:
name: app-config
key: redis-url
- name: WORKER_CONCURRENCY
value: "5"
- name: LOG_LEVEL
value: "info"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 30
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
restartPolicy: Always# k8s/worker-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: scrape-worker-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: scrape-worker
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: External
external:
metric:
name: redis_queue_depth
selector:
matchLabels:
queue: scrape-pending
target:
type: AverageValue
averageValue: "100"
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Pods
value: 4
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 25
periodSeconds: 120# k8s/secrets.yaml (encrypt with sealed-secrets or use external-secrets)
apiVersion: v1
kind: Secret
metadata:
name: firecrawl-secrets
type: Opaque
stringData:
api-key: "fc-your-api-key-here"
openai-key: "sk-your-openai-key-here"
postgres-password: "your-secure-password"
---
# k8s/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
redis-url: "redis://redis-master.default.svc.cluster.local:6379"
database-url: "postgresql://pipeline@postgres.default.svc.cluster.local:5432/firecrawl_pipeline"
worker-concurrency: "5"
log-level: "info"
cache-ttl-hours: "24"# k8s/pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: scrape-worker-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: scrape-workerNever store secrets in code, config files, or container images. Use a dedicated secrets manager.
flowchart TD
A[Application] --> B{Secret Source}
B -- Development --> C[.env File]
B -- Staging --> D[Kubernetes Secrets]
B -- Production --> E[External Secrets Manager]
E --> F[AWS Secrets Manager]
E --> G[HashiCorp Vault]
E --> H[GCP Secret Manager]
F --> I[Auto-rotation]
G --> I
H --> I
classDef dev fill:#e8f5e8,stroke:#1b5e20
classDef staging fill:#fff3e0,stroke:#e65100
classDef prod fill:#e1f5fe,stroke:#01579b
class C dev
class D staging
class E,F,G,H,I prod
import os
from dataclasses import dataclass
@dataclass
class AppConfig:
"""Application configuration loaded from environment."""
firecrawl_api_key: str
redis_url: str
database_url: str
openai_api_key: str = ""
worker_concurrency: int = 5
log_level: str = "info"
cache_ttl_hours: int = 24
@classmethod
def from_env(cls) -> "AppConfig":
"""Load configuration from environment variables."""
required = ["FIRECRAWL_API_KEY", "REDIS_URL", "DATABASE_URL"]
missing = [var for var in required if var not in os.environ]
if missing:
raise EnvironmentError(f"Missing required env vars: {', '.join(missing)}")
return cls(
firecrawl_api_key=os.environ["FIRECRAWL_API_KEY"],
redis_url=os.environ["REDIS_URL"],
database_url=os.environ["DATABASE_URL"],
openai_api_key=os.getenv("OPENAI_API_KEY", ""),
worker_concurrency=int(os.getenv("WORKER_CONCURRENCY", "5")),
log_level=os.getenv("LOG_LEVEL", "info"),
cache_ttl_hours=int(os.getenv("CACHE_TTL_HOURS", "24")),
)
# Usage
config = AppConfig.from_env()Automate testing, building, and deployment to ensure consistent, reliable releases.
# .github/workflows/deploy.yml
name: Build and Deploy
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install -r requirements.txt -r requirements-dev.txt
- run: pytest tests/ -v --cov=src/
build:
needs: test
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- uses: docker/build-push-action@v5
with:
push: true
tags: ghcr.io/${{ github.repository }}/worker:${{ github.sha }}
deploy:
needs: build
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: |
kubectl set image deployment/scrape-worker \
worker=ghcr.io/${{ github.repository }}/worker:${{ github.sha }}
kubectl rollout status deployment/scrape-worker --timeout=300sflowchart LR
A[Git Push] --> B[Run Tests]
B --> C{Tests Pass?}
C -- Yes --> D[Build Docker Image]
C -- No --> E[Notify: Failed]
D --> F[Push to Registry]
F --> G[Deploy to Staging]
G --> H[Smoke Tests]
H --> I{Pass?}
I -- Yes --> J[Deploy to Production]
I -- No --> K[Rollback]
J --> L[Monitor for 30min]
L --> M{Healthy?}
M -- Yes --> N[Done]
M -- No --> K
classDef pass fill:#e8f5e8,stroke:#1b5e20
classDef fail fill:#ffebee,stroke:#b71c1c
class N pass
class E,K fail
import json
import logging
import sys
from datetime import datetime, timezone
class JSONFormatter(logging.Formatter):
"""Structured JSON log formatter for production."""
def format(self, record):
log_entry = {
"timestamp": datetime.now(timezone.utc).isoformat(),
"level": record.levelname,
"logger": record.name,
"message": record.getMessage(),
"module": record.module,
"function": record.funcName,
}
# Add extra fields
if hasattr(record, "url"):
log_entry["url"] = record.url
if hasattr(record, "duration_ms"):
log_entry["duration_ms"] = record.duration_ms
if hasattr(record, "status"):
log_entry["status"] = record.status
# Add exception info
if record.exc_info:
log_entry["exception"] = self.formatException(record.exc_info)
return json.dumps(log_entry)
# Configure structured logging
handler = logging.StreamHandler(sys.stdout)
handler.setFormatter(JSONFormatter())
logger = logging.getLogger("scraper")
logger.addHandler(handler)
logger.setLevel(logging.INFO)
# Usage
logger.info("Scrape completed", extra={
"url": "https://example.com",
"duration_ms": 1250,
"status": "success",
})from prometheus_client import Counter, Histogram, Gauge, start_http_server
# Define metrics
SCRAPE_TOTAL = Counter(
"scrape_requests_total",
"Total scrape requests",
["status", "domain"]
)
SCRAPE_DURATION = Histogram(
"scrape_duration_seconds",
"Scrape request duration",
["domain"],
buckets=[0.5, 1, 2, 5, 10, 30, 60]
)
QUEUE_DEPTH = Gauge(
"scrape_queue_depth",
"Number of pending scrape jobs"
)
CACHE_HIT_RATE = Gauge(
"scrape_cache_hit_rate",
"Cache hit rate percentage"
)
# Start metrics server
start_http_server(8080)
# Instrument scraping
import time
def instrumented_scrape(app, url):
"""Scrape with Prometheus instrumentation."""
domain = urlparse(url).netloc
start = time.time()
try:
result = app.scrape_url(url, params={"formats": ["markdown"]})
duration = time.time() - start
SCRAPE_TOTAL.labels(status="success", domain=domain).inc()
SCRAPE_DURATION.labels(domain=domain).observe(duration)
return result
except Exception as exc:
SCRAPE_TOTAL.labels(status="error", domain=domain).inc()
raisefrom http.server import HTTPServer, BaseHTTPRequestHandler
import threading
import json
class HealthHandler(BaseHTTPRequestHandler):
"""Simple health check endpoint."""
def do_GET(self):
if self.path == "/health":
self.send_response(200)
self.send_header("Content-Type", "application/json")
self.end_headers()
self.wfile.write(json.dumps({"status": "healthy"}).encode())
elif self.path == "/ready":
# Check dependencies
ready = check_redis() and check_database()
status = 200 if ready else 503
self.send_response(status)
self.send_header("Content-Type", "application/json")
self.end_headers()
self.wfile.write(json.dumps({
"ready": ready,
"redis": check_redis(),
"database": check_database(),
}).encode())
else:
self.send_response(404)
self.end_headers()
def log_message(self, format, *args):
pass # Suppress default logging
def start_health_server(port=8080):
"""Start health check server in a background thread."""
server = HTTPServer(("0.0.0.0", port), HealthHandler)
thread = threading.Thread(target=server.serve_forever, daemon=True)
thread.start()
return serverimport subprocess
from datetime import datetime
def backup_chroma(chroma_path: str, backup_dir: str):
"""Backup ChromaDB to a compressed archive."""
timestamp = datetime.utcnow().strftime("%Y%m%d_%H%M%S")
backup_file = f"{backup_dir}/chroma_backup_{timestamp}.tar.gz"
subprocess.run([
"tar", "-czf", backup_file, "-C", chroma_path, "."
], check=True)
print(f"Backup created: {backup_file}")
return backup_file
def backup_to_s3(local_path: str, s3_bucket: str, s3_prefix: str):
"""Upload backup to S3."""
import boto3
s3 = boto3.client("s3")
filename = local_path.split("/")[-1]
s3_key = f"{s3_prefix}/{filename}"
s3.upload_file(local_path, s3_bucket, s3_key)
print(f"Uploaded to s3://{s3_bucket}/{s3_key}")| Component | Frequency | Retention | Method |
|---|---|---|---|
| Vector database | Daily | 30 days | Snapshot + S3 |
| PostgreSQL | Every 6 hours | 14 days | pg_dump + S3 |
| Redis (queue state) | Hourly | 7 days | RDB snapshot |
| Application config | On every deploy | Forever | Git repository |
| Scraped content | On collection | Until re-scraped | S3 lifecycle rules |
flowchart TD
A[Incident Detected] --> B{Severity}
B -- "P1: Complete outage" --> C[Page on-call]
B -- "P2: Degraded" --> D[Slack alert]
B -- "P3: Minor" --> E[Create ticket]
C --> F[Assess impact]
D --> F
F --> G{Root Cause}
G -- "Queue backlog" --> H[Scale workers]
G -- "API key expired" --> I[Rotate key]
G -- "Database down" --> J[Failover DB]
G -- "Vector DB corrupt" --> K[Restore from backup]
H --> L[Monitor recovery]
I --> L
J --> L
K --> L
L --> M[Post-mortem]
classDef critical fill:#ffebee,stroke:#b71c1c
classDef warn fill:#fff3e0,stroke:#e65100
classDef info fill:#e1f5fe,stroke:#01579b
class C critical
class D warn
class E info
Use this checklist before deploying to production.
- All containers build and pass tests in CI
- Kubernetes deployments, HPAs, and PDBs configured
- Resource requests and limits set for all containers
- Persistent volumes provisioned for databases
- All secrets stored in secrets manager (never in code or config)
- API keys scoped to minimum required permissions
- Non-root users in all containers
- Network policies restrict inter-service communication
- HTTPS enforced on all external endpoints
- Structured JSON logging configured
- Prometheus metrics exposed and scraped
- Grafana dashboards created for key metrics
- Alerts configured for: error rate > 10%, latency > 30s, queue depth > 1000
- Log aggregation set up (Loki, CloudWatch, or equivalent)
- Health check and readiness endpoints implemented
- Graceful shutdown handles SIGTERM
- Retry logic with exponential backoff for all external calls
- Circuit breaker for Firecrawl API calls
- Backup schedule configured and tested
- Disaster recovery runbook documented and accessible
- Load test completed at 2x expected throughput
- Runbook documented for common failure scenarios
- On-call rotation defined
- Rollback procedure tested
- Cost monitoring and budget alerts configured
| Problem | Possible Cause | Solution |
|---|---|---|
| Pods crashing on start | Missing env vars or secrets | Check kubectl describe pod for error messages |
| OOM kills | Memory limit too low | Increase memory limit; profile memory usage |
| Slow autoscaling | HPA stabilization window too long | Reduce stabilization window; adjust metrics |
| Stale queue jobs | Workers crashed mid-processing | Add job timeout; implement dead-letter queue |
| High error rate after deploy | Code regression | Rollback with kubectl rollout undo |
| Backup restore fails | Incompatible versions | Test restores regularly; version-tag backups |
- Use rolling deployments with
maxSurge: 1andmaxUnavailable: 0for zero-downtime deploys. - Set resource requests accurately -- under-requesting causes noisy-neighbor issues; over-requesting wastes money.
- Use node affinity to place scraping workers on nodes with good network connectivity.
- Implement circuit breakers to stop calling failing APIs and let them recover.
- Profile in staging before production -- use realistic URL lists and load patterns.
- Rotate API keys regularly -- automate rotation with your secrets manager.
- Enable audit logging -- log who accessed what data and when.
- Scan container images for vulnerabilities with Trivy or Snyk.
- Enforce network policies -- workers should only reach Redis, the database, and Firecrawl's API.
- Never log sensitive data -- filter API keys, passwords, and PII from log output.
- Use RBAC in Kubernetes -- each service account should have minimal permissions.
Production deployment transforms a development prototype into a reliable, scalable system. By containerizing your application, deploying on Kubernetes with autoscaling, securing credentials, implementing CI/CD, adding comprehensive observability, and preparing disaster recovery procedures, you build a system that can operate unattended and recover gracefully from failures.
- Containerize everything -- Docker containers make deployments reproducible and portable across environments.
- Use Kubernetes for orchestration -- HPAs, PDBs, and rolling deployments provide the reliability primitives you need.
- Never store secrets in code -- use environment variables backed by a secrets manager with automated rotation.
- Observability is not optional -- structured logs, Prometheus metrics, and alerting are essential for operating a production system.
- Test your recovery procedures -- a backup that has never been restored is not a backup.
Congratulations -- you have completed the entire Firecrawl tutorial. You now have the knowledge to:
- Scrape any website (static or JavaScript-rendered) and extract clean, structured data
- Build RAG pipelines that give your LLMs access to up-to-date web content
- Scale to thousands of pages with distributed workers, caching, and rate limiting
- Deploy to production with containers, Kubernetes, CI/CD, and full observability
From here, consider these advanced topics:
- Fine-tuning LLMs on your scraped corpus for domain-specific performance
- Building agentic systems that autonomously discover and scrape new sources
- Real-time scraping with webhooks and event-driven architectures
- Multi-modal extraction including images, PDFs, and video transcripts
Built with insights from the Firecrawl project.
Most teams struggle here because the hard part is not writing more code, but deciding clear boundaries for worker, name, self so behavior stays predictable as complexity grows.
In practical terms, this chapter helps you avoid three common failures:
- coupling core logic too tightly to one implementation path
- missing the handoff boundaries between setup, execution, and validation
- shipping changes without clear rollback or observability strategy
After working through this chapter, you should be able to reason about Chapter 8: Production Deployment as an operating subsystem inside Firecrawl Tutorial: Building LLM-Ready Web Scraping and Data Extraction Systems, with explicit contracts for inputs, state transitions, and outputs.
Use the implementation notes around info, scrape, record as your checklist when adapting these patterns to your own repository.
Under the hood, Chapter 8: Production Deployment usually follows a repeatable control path:
- Context bootstrap: initialize runtime config and prerequisites for
worker. - Input normalization: shape incoming data so
namereceives stable contracts. - Core execution: run the main logic branch and propagate intermediate state through
self. - Policy and safety checks: enforce limits, auth scopes, and failure boundaries.
- Output composition: return canonical result payloads for downstream consumers.
- Operational telemetry: emit logs/metrics needed for debugging and performance tuning.
When debugging, walk this sequence in order and confirm each stage has explicit success/failure conditions.
Use the following upstream sources to verify implementation details while reading this chapter:
- View Repo
Why it matters: authoritative reference on
View Repo(github.com).
Suggested trace strategy:
- search upstream code for
workerandnameto map concrete implementation paths - compare docs claims against actual runtime/config code before reusing patterns in production