1- # Worker Roles: Asynchronous Job Processing
1+ # Worker Roles: Asynchronous Job Processing
22
3- ** Status** : ✅ Production-Ready
3+ ** Status** : Production-Ready
44** Implementation** : Complete worker system with Kubernetes integration
55** Coverage** : Job queues, worker lifecycle, async offloading, scaling
66
77Enable horizontal scaling and asynchronous processing of provider operations. Offload long-running tasks to backend worker pods while API server returns immediately with job tracking IDs.
88
99---
1010
11- ## 🎯 Overview
11+ ## Overview
1212
1313The ** Worker Role system** transforms synchronous provider operations into asynchronous, scalable jobs:
1414
1515### Before (Synchronous, Blocking)
1616```
1717Client Request
1818 ↓
19- API Server ─→ Provider Logic (blocking for 5+ seconds) ─→ Response
19+ API Server --> Provider Logic (blocking for 5+ seconds) --> Response
2020 ↓
2121 Client waits for completion
2222```
@@ -25,7 +25,7 @@ API Server ─→ Provider Logic (blocking for 5+ seconds) ─→ Response
2525```
2626Client Request
2727 ↓
28- API Server ─→ Submit to Queue (instant) ─→ Response with job_id
28+ API Server --> Submit to Queue (instant) --> Response with job_id
2929 ↓
3030 Worker Pod (processes asynchronously)
3131 ↓
@@ -34,55 +34,28 @@ API Server ─→ Submit to Queue (instant) ─→ Response with job_id
3434
3535### Key Benefits
3636
37- ✅ ** Horizontal Scaling** — Scale worker pods independently from API
38- ✅ ** Async Processing** — Client gets immediate response with job_id
39- ✅ ** Load Distribution** — Multiple workers share operation workload
40- ✅ ** Resilience** — Failed jobs captured in dead-letter queue
41- ✅ ** Kubernetes Native** — Full RBAC, health checks, Prometheus metrics
42- ✅ ** Production Ready** — Used in ITL ControlPlane core infrastructure
37+ ** Horizontal Scaling** — Scale worker pods independently from API
38+ ** Async Processing** — Client gets immediate response with job_id
39+ ** Load Distribution** — Multiple workers share operation workload
40+ ** Resilience** — Failed jobs captured in dead-letter queue
41+ ** Kubernetes Native** — Full RBAC, health checks, Prometheus metrics
42+ ** Production Ready** — Used in ITL ControlPlane core infrastructure
4343
4444---
4545
46- ## 🏗️ Architecture
46+ ## Architecture
4747
4848### System Flow Diagram
4949
50- ```
51- ┌─────────────────────────────────────────────────────────────┐
52- │ API Server (Main) │
53- │ - Receives HTTP requests │
54- │ - Validates input │
55- │ - Submits jobs to queue │
56- │ - Returns job_id to client for tracking │
57- └──────────────────────┬──────────────────────────────────────┘
58- │
59- │ Submit Job
60- ▼
61- ┌─────────────────────────────────────────────────────────────┐
62- │ RabbitMQ Job Queue (Message Broker) │
63- │ - provider.jobs: Queue for pending jobs │
64- │ - provider.results: Queue for job results │
65- │ - provider.jobs.dlq: Dead-letter queue for failures │
66- └──────────────────────┬──────────────────────────────────────┘
67- │
68- │ Consume Job
69- ▼
70- ┌──────────────────────────────────────────────────────────────┐
71- │ Worker Pod #1 Worker Pod #2 Worker Pod #3 │
72- │ ┌────────────────────┐ ┌────────────────────┐ │
73- │ │ ProviderWorker │ │ ProviderWorker │ │
74- │ │ - Process jobs │ │ - Process jobs │ │
75- │ │ - Execute ops │ │ - Execute ops │ │
76- │ │ - Store results │ │ - Store results │ │
77- │ └────────────────────┘ └────────────────────┘ │
78- └──────────────────────┬──────────────────────────────────────┘
79- │
80- │ Publish Result
81- ▼
82- ┌─────────────────────────────────────────────────────────────┐
83- │ Result Store (Redis/Cache) │
84- │ (Clients poll this for job completion) │
85- └─────────────────────────────────────────────────────────────┘
50+ ``` mermaid
51+ flowchart TD
52+ A["API Server (Main)<br>- Receives HTTP requests<br>- Validates input<br>- Submits jobs to queue<br>- Returns job_id to client"]
53+ B["RabbitMQ Job Queue (Message Broker)<br>- provider.jobs: pending jobs<br>- provider.results: job results<br>- provider.jobs.dlq: dead-letter queue"]
54+ C["Worker Pods (x3) — ProviderWorker<br>- Process jobs - Execute ops - Store results"]
55+ D["Result Store (Redis/Cache)<br>Clients poll for job completion"]
56+ A -->|Submit Job| B
57+ B -->|Consume Job| C
58+ C -->|Publish Result| D
8659```
8760
8861### Components
@@ -97,7 +70,7 @@ API Server ─→ Submit to Queue (instant) ─→ Response with job_id
9770
9871---
9972
100- ## ⚡ Quick Start (5 Minutes)
73+ ## Quick Start (5 Minutes)
10174
10275### Step 1: Enable Workers in Helm
10376
@@ -161,7 +134,7 @@ kubectl port-forward svc/rabbitmq 15672:15672
161134
162135---
163136
164- ## 📚 Components & API Reference
137+ ## Components & API Reference
165138
166139### JobQueue
167140
@@ -200,11 +173,11 @@ result = await job_queue.get_result(
200173``` python
201174stats = await job_queue.get_queue_stats()
202175# {
203- # "connected": true,
204- # "queues": {
205- # "provider.jobs": {"message_count": 5, "consumer_count": 3},
206- # "provider.jobs.dlq": {"message_count": 0}
207- # }
176+ # "connected": true,
177+ # "queues": {
178+ # "provider.jobs": {"message_count": 5, "consumer_count": 3},
179+ # "provider.jobs.dlq": {"message_count": 0}
180+ # }
208181# }
209182```
210183
@@ -339,10 +312,10 @@ registry.register_worker(worker)
339312# Get status
340313status = registry.get_registry_status()
341314# {
342- # "total_workers": 3,
343- # "active_workers": 3,
344- # "total_jobs_processed": 1542,
345- # "total_jobs_failed": 3
315+ # "total_workers": 3,
316+ # "active_workers": 3,
317+ # "total_jobs_processed": 1542,
318+ # "total_jobs_failed": 3
346319# }
347320
348321# Start/stop all
@@ -352,7 +325,7 @@ await registry.stop_all_workers()
352325
353326-- -
354327
355- # # ⚙️ Configuration
328+ # # Configuration
356329
357330# ## Development (1 Worker)
358331
@@ -439,7 +412,7 @@ JOB_TIMEOUT=300 # Default job timeout (seconds)
439412
440413-- -
441414
442- # # 🚀 Deployment (Kubernetes/Helm)
415+ # # Deployment (Kubernetes/Helm)
443416
444417# ## Files Created
445418
@@ -487,7 +460,7 @@ kubectl exec <worker-pod> -- curl http://localhost:8001/health
487460
488461-- -
489462
490- # # 📊 Health & Monitoring
463+ # # Health & Monitoring
491464
492465# ## Health Check Endpoints
493466
@@ -557,7 +530,7 @@ kubectl port-forward svc/rabbitmq 15672:15672
557530
558531-- -
559532
560- # # 📈 Scaling Strategies
533+ # # Scaling Strategies
561534
562535# ## Horizontal Scaling (Number of Workers)
563536
@@ -607,7 +580,7 @@ await queue.submit_job(..., priority=1)
607580
608581-- -
609582
610- # # ⚠️ Error Handling & Resilience
583+ # # Error Handling & Resilience
611584
612585# ## Dead-Letter Queue (DLQ)
613586
@@ -674,7 +647,7 @@ class CustomWorker(ProviderWorker):
674647
675648-- -
676649
677- # # 🔒 Security
650+ # # Security
678651
679652# ## RBAC (ClusterRole)
680653
@@ -714,7 +687,7 @@ workers:
714687
715688-- -
716689
717- # # 🆘 Troubleshooting
690+ # # Troubleshooting
718691
719692# ## Workers Not Consuming Jobs
720693
@@ -796,7 +769,7 @@ helm upgrade --set workers.resources.limits.memory=1Gi
796769
797770-- -
798771
799- # # 💡 Real-World Examples
772+ # # Real-World Examples
800773
801774# ## Example 1: API with Offloading
802775
@@ -888,7 +861,7 @@ except asyncio.TimeoutError:
888861
889862-- -
890863
891- # # 👍 Best Practices
864+ # # Best Practices
892865
893866# ## Priority Management
894867
@@ -935,7 +908,7 @@ finally:
935908
936909-- -
937910
938- # # 🔗 Related Documentation
911+ # # Related Documentation
939912
940913- [08 - API_ENDPOINTS .md](08 - API_ENDPOINTS .md) - FastAPI integration
941914- [06 - HANDLER_MIXINS .md](06 - HANDLER_MIXINS .md) - Handler patterns
@@ -944,7 +917,7 @@ finally:
944917
945918-- -
946919
947- # # 📖 Quick Reference
920+ # # Quick Reference
948921
949922# ## Install
950923```bash
@@ -993,5 +966,5 @@ return {"job_id": response.job_id, "status": "pending"}
993966
994967** Document Version** : 1.0 (Consolidated from 4 docs)
995968** Last Updated** : February 14 , 2026
996- ** Status** : ✅ Production- Ready
969+ ** Status** : Production- Ready
997970
0 commit comments