| layout | default |
|---|---|
| title | Chapter 6: Scaling |
| parent | Deer Flow Tutorial |
| nav_order | 6 |
Welcome to Chapter 6: Scaling. In this part of Deer Flow Tutorial: Distributed Workflow Orchestration Platform, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.
Scale Deer Flow across distributed systems with horizontal scaling, load balancing, and resource management.
As workflow complexity and volume grow, Deer Flow must scale to meet demand. This chapter covers distributed architecture, horizontal scaling, resource management, and performance optimization.
┌─────────────────────────────────────────────────────────────────┐
│ Deer Flow Distributed Architecture │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ API Gateway │ │
│ │ (Load Balanced Entry Point) │ │
│ └───────────────────────┬─────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────┼─────────────────────────────────┐ │
│ │ Scheduler Cluster (Active-Passive) │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │Scheduler│ │Scheduler│ │Scheduler│ │ │
│ │ │(Active) │ │(Standby)│ │(Standby)│ │ │
│ │ └─────────┘ └─────────┘ └─────────┘ │ │
│ └───────────────────────┬─────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────┼─────────────────────────────────┐ │
│ │ Message Queue │ │
│ │ (Kafka / RabbitMQ / Redis) │ │
│ └───────────────────────┬─────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────┼─────────────────────────────────┐ │
│ │ Worker Pool (Auto-Scaling) │ │
│ │ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │ │
│ │ │Worker│ │Worker│ │Worker│ │Worker│ │Worker│ │Worker│ │ │
│ │ │ 1 │ │ 2 │ │ 3 │ │ N │ │ N+1 │ │ ... │ │ │
│ │ └──────┘ └──────┘ └──────┘ └──────┘ └──────┘ └──────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Storage Layer │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │PostgreSQL│ │ S3 │ │ Redis │ │ │
│ │ │(Metadata)│ │(Artifacts)│ │ (Cache) │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
# k8s/scheduler-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: deerflow-scheduler
spec:
replicas: 3
selector:
matchLabels:
app: deerflow-scheduler
template:
metadata:
labels:
app: deerflow-scheduler
spec:
containers:
- name: scheduler
image: deerflow/scheduler:latest
env:
- name: REDIS_URL
value: redis://redis:6379
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: deerflow-secrets
key: database-url
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
---
# k8s/worker-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: deerflow-worker
spec:
replicas: 5
selector:
matchLabels:
app: deerflow-worker
template:
metadata:
labels:
app: deerflow-worker
spec:
containers:
- name: worker
image: deerflow/worker:latest
env:
- name: QUEUE_URL
value: amqp://rabbitmq:5672
- name: WORKER_CONCURRENCY
value: "4"
resources:
requests:
memory: "1Gi"
cpu: "1"
limits:
memory: "4Gi"
cpu: "2"# k8s/worker-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: deerflow-worker-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: deerflow-worker
minReplicas: 3
maxReplicas: 50
metrics:
- type: External
external:
metric:
name: rabbitmq_queue_messages
selector:
matchLabels:
queue: deerflow-tasks
target:
type: AverageValue
averageValue: "10"
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70from deerflow.scaling import QueueBasedScaler
scaler = QueueBasedScaler(
queue="deerflow-tasks",
min_workers=3,
max_workers=100,
scale_up_threshold=50, # Queue depth to trigger scale up
scale_down_threshold=5, # Queue depth to trigger scale down
scale_up_step=5, # Workers to add
scale_down_step=2, # Workers to remove
cooldown_period=300 # Seconds between scaling actions
)
scaler.start()# config/worker-pools.yaml
worker_pools:
default:
min_workers: 5
max_workers: 20
task_types: ["*"]
cpu_intensive:
min_workers: 2
max_workers: 10
task_types: ["python", "spark"]
resources:
cpu: 4
memory: 8Gi
io_bound:
min_workers: 10
max_workers: 50
task_types: ["http", "sql"]
resources:
cpu: 1
memory: 2Gi
gpu:
min_workers: 1
max_workers: 5
task_types: ["ml_inference", "training"]
resources:
cpu: 4
memory: 16Gi
gpu: 1from deerflow.routing import TaskRouter
router = TaskRouter()
# Round-robin (default)
router.strategy = "round_robin"
# Least connections
router.strategy = "least_connections"
# Weighted routing
router.strategy = "weighted"
router.weights = {
"worker-pool-1": 3,
"worker-pool-2": 2,
"worker-pool-3": 1
}
# Consistent hashing (for stateful tasks)
router.strategy = "consistent_hash"
router.hash_key = "workflow_id"# config/queues.yaml
queues:
critical:
priority: 1
max_workers: 20
timeout: 60
high:
priority: 2
max_workers: 15
timeout: 300
normal:
priority: 3
max_workers: 10
timeout: 1800
low:
priority: 4
max_workers: 5
timeout: 7200{
"id": "priority_task",
"type": "python",
"config": {"script": "critical_job.py"},
"routing": {
"queue": "critical",
"priority": 1,
"worker_pool": "cpu_intensive"
}
}# config/quotas.yaml
quotas:
organization:
max_concurrent_workflows: 100
max_concurrent_tasks: 1000
cpu_limit: 500
memory_limit: 2Ti
teams:
data-engineering:
max_concurrent_workflows: 50
max_concurrent_tasks: 500
cpu_limit: 200
memory_limit: 800Gi
ml-team:
max_concurrent_workflows: 20
max_concurrent_tasks: 100
cpu_limit: 100
memory_limit: 500Gi
gpu_limit: 10from deerflow import Workflow, ResourceRequest
@workflow.task(id="adaptive_task")
def process_data(context):
data_size = get_data_size()
# Request resources based on data size
if data_size > 1_000_000:
context.request_resources(
cpu=4,
memory="16Gi"
)
elif data_size > 100_000:
context.request_resources(
cpu=2,
memory="8Gi"
)
# Process with allocated resources
return process(data)# k8s/spot-workers.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: deerflow-worker-spot
spec:
replicas: 10
template:
spec:
nodeSelector:
node-type: spot
tolerations:
- key: "spot"
operator: "Equal"
value: "true"
effect: "NoSchedule"
containers:
- name: worker
image: deerflow/worker:latest
env:
- name: WORKER_TYPE
value: "preemptible"
- name: CHECKPOINT_INTERVAL
value: "60" # Frequent checkpoints for preemptiblefrom deerflow import Workflow, cache
workflow = Workflow(name="cached_workflow")
@workflow.task(id="expensive_computation")
@cache(
ttl=3600,
key="${params.date}:${params.region}",
backend="redis"
)
def expensive_computation(context):
# This result will be cached
return compute_expensive_result()from deerflow import Workflow, batch
workflow = Workflow(name="batch_workflow")
@workflow.task(id="batch_process")
@batch(size=100, parallel=10)
def process_items(items, context):
"""Process items in batches of 100, 10 batches in parallel."""
results = []
for item in items:
results.append(process_single(item))
return results# config/connections.yaml
connections:
database:
pool_size: 20
max_overflow: 10
pool_timeout: 30
pool_recycle: 3600
http:
pool_connections: 100
pool_maxsize: 100
max_retries: 3
redis:
max_connections: 50# config/regions.yaml
regions:
us-east:
primary: true
scheduler: true
workers: 20
endpoints:
api: https://us-east.deerflow.example.com
queue: amqp://mq-us-east.internal
us-west:
primary: false
scheduler: false
workers: 15
endpoints:
api: https://us-west.deerflow.example.com
queue: amqp://mq-us-west.internal
eu-west:
primary: false
scheduler: true # DR scheduler
workers: 10
endpoints:
api: https://eu-west.deerflow.example.com
queue: amqp://mq-eu-west.internal
replication:
enabled: true
mode: async
lag_threshold: 30s{
"name": "regional_workflow",
"affinity": {
"region": "us-east",
"fallback_regions": ["us-west", "eu-west"]
},
"tasks": [...]
}In this chapter, you've learned:
- Distributed Architecture: Components and deployment
- Horizontal Scaling: Auto-scaling workers
- Load Balancing: Task distribution strategies
- Resource Management: Quotas and dynamic allocation
- Performance: Caching, batching, pooling
- Multi-Region: Geographic distribution
- Scale Workers: Auto-scale based on queue depth
- Use Pools: Different pools for different workloads
- Set Quotas: Prevent resource exhaustion
- Cache Results: Avoid redundant computation
- Plan for Regions: Consider latency and disaster recovery
Ready to monitor and observe your workflows? Let's explore Chapter 7.
Ready for Chapter 7? Monitoring
Generated for Awesome Code Docs
Most teams struggle here because the hard part is not writing more code, but deciding clear boundaries for deerflow, name, worker so behavior stays predictable as complexity grows.
In practical terms, this chapter helps you avoid three common failures:
- coupling core logic too tightly to one implementation path
- missing the handoff boundaries between setup, execution, and validation
- shipping changes without clear rollback or observability strategy
After working through this chapter, you should be able to reason about Chapter 6: Scaling as an operating subsystem inside Deer Flow Tutorial: Distributed Workflow Orchestration Platform, with explicit contracts for inputs, state transitions, and outputs.
Use the implementation notes around scheduler, yaml, memory as your checklist when adapting these patterns to your own repository.
Under the hood, Chapter 6: Scaling usually follows a repeatable control path:
- Context bootstrap: initialize runtime config and prerequisites for
deerflow. - Input normalization: shape incoming data so
namereceives stable contracts. - Core execution: run the main logic branch and propagate intermediate state through
worker. - Policy and safety checks: enforce limits, auth scopes, and failure boundaries.
- Output composition: return canonical result payloads for downstream consumers.
- Operational telemetry: emit logs/metrics needed for debugging and performance tuning.
When debugging, walk this sequence in order and confirm each stage has explicit success/failure conditions.
Use the following upstream sources to verify implementation details while reading this chapter:
- Official Documentation
Why it matters: authoritative reference on
Official Documentation(github.com). - GitHub Repository
Why it matters: authoritative reference on
GitHub Repository(github.com). - API Reference
Why it matters: authoritative reference on
API Reference(github.com). - Community & Issues
Why it matters: authoritative reference on
Community & Issues(github.com). - Workflow Examples
Why it matters: authoritative reference on
Workflow Examples(github.com). - AI Codebase Knowledge Builder
Why it matters: authoritative reference on
AI Codebase Knowledge Builder(github.com).
Suggested trace strategy:
- search upstream code for
deerflowandnameto map concrete implementation paths - compare docs claims against actual runtime/config code before reusing patterns in production